Professional OPC
Development Tools

logos

Performance issues

More
13 Jun 2016 12:21 #4129 by support
Replied by support on topic Performance issues
If, in the first case, other threads in the process themselves already consume near 100% CPU for extended periods of time (several seconds), I do not see it as an issue that we should be resolving. Ideally it would behave as best as possible, but it's not a case we design for, and there even may be reasons beyond our control that cause the negative "backfire".

Please Log in or Create an account to join the conversation.

More
13 Jun 2016 11:37 #4128 by juanbernabeu
Replied by juanbernabeu on topic Performance issues
Indeed, we see two cases:
1) Our process is busy (due to initialization) and adding the startup of your component makes the CPU stay on 100% much longer than usual
2) A OPC write tag command is issue towards your component and the CPU goes to 100% for a relativelly long period of time

What we propose is to simulate case 1 by creating a process that does something dummy (for a short period of time), and then start your component. We suspect that you might see a similar pattern, and then help you to debug it

Please Log in or Create an account to join the conversation.

More
13 Jun 2016 11:16 #4127 by support
Replied by support on topic Performance issues
I am sorry but I do not understand your request.

My understanding was that you are observing CPU being consumed for unknown reason. And we want to find the cause. How would creating something that artificially consumes CPU help in investigating the issue?

Please Log in or Create an account to join the conversation.

More
13 Jun 2016 11:07 #4126 by juanbernabeu
Replied by juanbernabeu on topic Performance issues
Would it be possible for you to create a little app that would create extra threads (like 20-30 of them) that does dummy work (and on their own gets near 100% CPU utilization), and then start the component. Maybe that would be a way to reproduce the issue

Please Log in or Create an account to join the conversation.

More
10 Jun 2016 14:10 #4118 by support
Replied by support on topic Performance issues
Thank you. I really have not much more to say or help, as it seems to be necessary to be able to debug the process and figure out what is actually happening.

When I make a simple test using e.g. one our VBScript examples (which also use the COM API of QuickOPC), I do see approx. 20 threads that look like managed threads (created from clr.dll), so I think such number can be considered normal. There is approx. 20 other threads too. Besides an initial short spike, the CPU consumption (while being subscribed and consuming the notifications) stays near zero (less than 1%). The number of context switches on any thread is usually below 100, bud definitely below 1000, after 1 minute of running. So there is definitely something weird about them in your case.

I am sorry but I do not have more info at the moment.

Regards

Please Log in or Create an account to join the conversation.

More
08 Jun 2016 07:25 #4112 by juanbernabeu
Replied by juanbernabeu on topic Performance issues
See comments inline

support wrote: Thank you for the answers.

Based on what you have stated, it looks less likely me to me know that .NET garbage collector has to do with the issue. I was missing, however, the "# Gen 0/1/2 Collections" counters form your picture - those, I believe, would tell us for sure whether the GC collections are taking place or not.

[A] The # Get 1/2 Collections are zero so they aren’t visible on the graph (just a flat line at the bottom). The graph just shows “100%” so the attached “gc_allocs.csv” contains all collected .Net samples. It indicates that # Gen 0 Collections is one.

I am surprised by the number of context switches. My original thinking was that the CPU is loaded with a thread that makes a tight loop (or more such threads), but it could be simply the context switches themselves that are consuming the CPU cycles. But it's not clear what can be causing them. BTW, I assume that when you say the CPU usage is near 100%, that actually *all* the cores seem to be used that much? - not just some of them?

[A] yes, all cores go 100%

The number of threads is also unexpectedly high. To my knowledge we do not (intentionally) create 30 or so threads just for a simple Subscribe. I assume you not creating many EasyUAClient, right? It would really be useful to know "who" has created them and which purpose they serve - I can then help you correlate it with what we think is the "good" state, and which threads belong to us (in good faith), and which are of (yet) unknown origin.

[A] We only create a single EasyUAClient. When running our application in the Delphi debugger (and starting it in a minimal way without any clients, extra workers etc), and delaying the OPC subscription, then there are 18 threads when it finishes its startup and everything settles.
When the OPC subscription is then allowed to proceed, about 30 threads gets created.
In addition, it then continuously seems to create a few threads (1-3) which runs for a second or two and stops.
If I run the “UADocExamples.exe” sample code for Delphi that you provide, I also see some 30 threads being created.

Best regards

Attachments:

Please Log in or Create an account to join the conversation.

More
07 Jun 2016 13:27 #4092 by support
Replied by support on topic Performance issues
Thank you for the answers.

Based on what you have stated, it looks less likely me to me know that .NET garbage collector has to do with the issue. I was missing, however, the "# Gen 0/1/2 Collections" counters form your picture - those, I believe, would tell us for sure whether the GC collections are taking place or not.

I am surprised by the number of context switches. My original thinking was that the CPU is loaded with a thread that makes a tight loop (or more such threads), but it could be simply the context switches themselves that are consuming the CPU cycles. But it's not clear what can be causing them. BTW, I assume that when you say the CPU usage is near 100%, that actually *all* the cores seem to be used that much? - not just some of them?

The number of threads is also unexpectedly high. To my knowledge we do not (intentionally) create 30 or so threads just for a simple Subscribe. I assume you not creating many EasyUAClient, right? It would really be useful to know "who" has created them and which purpose they serve - I can then help you correlate it with what we think is the "good" state, and which threads belong to us (in good faith), and which are of (yet) unknown origin.

Best regards

Please Log in or Create an account to join the conversation.

More
06 Jun 2016 15:03 #4084 by juanbernabeu
Replied by juanbernabeu on topic Performance issues
See answers inline [A]

support wrote: Dear Sirs,
thank you for your interest in our products.

No, this is not a known problem. In fact, even if there were issues with high CPU usage under some conditions, I would never expect them with such a small load on the system (a;ready with around 6 subscribed items, if I understand it well). My general feeling that it is going to be some weird interaction between the subsystem involved - such as the Delphi runtime + .NET CLR + Windows service. But I do not know what it is at the moment.

I would definitely suggest to obtain more information first. Below are some questions and suggestions (and things to confirm).

1. Is it true, as I wrote above, that the issue already happens with small number of subscrined items?

[A] Yes. This even happens with just 3 subscribed items. We did not test using more tags as it was problematic enough as it is.

2. Do I understand it correctly that the high CPU usage occurs in two cases: a) when your clients connect at the same time when SubscribeMultipleMonitoredItems I being processed, and b) when any data change notification arrives ?

[A] Yes. If we delay the subscribe method until all clients are synchronized, then we don’t see that the CPU gets pegged at 100%.

3. I have a slight suspicion that .NET garbage collector may have something with - or possibly not, but we should rule it out first. Please use PERFMON to observer the process, and its GC-related characteristics, and figure out whether there isn't a correlation between .NET GC collections, and the periods of the high CPU usage. The related counters are mainly under ".NET CLR Memory": # Gen n Collections, % Time in GC, # Bytes in all Heaps, #GC Handles etc. Other .NET counters in other groups may be interesting as well.

[A] From local tests (not at customer premises), we see that ‘# GC handles’ is quite stable, the ‘# Bytes in all heaps’ is zero until the end of the 100% CPU spike, and then goes up a little bit at the same time as ‘Allocated Bytes/sec’ goes up a bit and then back down to zero (see attached image of a run where the subscribe method was called after all clients have connected and the CPU was quiet. Perfmon file can be provided if needed).

4. Thank you for the provided "log entries". Aren't there some more entries later, specifically near/during/short after at the periods of high CPU usage?

[A] No, we’ve not really seen any logs during the updates. We cannot reproduce this behavior in our test environemnt, which makes it hard to do tests on this aspect of the problem.

5. Have you tried to use the debugger to see which thread(s) are responsible for the high CPU usage, and where do they spend their time (call stack)? If not, can you do that?

[A] We’ve been busy doing this, but it’s quite hard and tedious as there are quite a lot of threads. We have ~60 threads without the OPC client, and it seems to add ~30 more. Some theads start and then stop, and it’s quite hard to know which thread belongs where and whether it’s relevant or not.

When we delay the start of the subscription, we can see that our own threads do indeed work hard (as they should), but only for a short period of time (a few seconds).
When the (delayed) subscription call is done, the OPC threads also do some work, but the CPU load is manageable and short. However, doing both at the same time seems causes complete CPU utilization for quite a long while (10s of seconds).

There are also a fair (lot?) amount of context switching going on when the system is busy. Most threads just do a few 100s, but several threads do several 10.000s of them. Some do it for several seconds, others have just a single peak and then goes quiet.

As you said earlier, it might be that the .Net runtime together with COM and native threads are stumbling over each other fighting for CPU. During testing some bottlenecks in our own code was found, but there’s a limit to how much we can optimize/postpone before things aren’t working anymore (the duration of the CPU usage is shorter, but still significant).

Curiously, every once in a while (1 out of 10 test runs?) everything seems to align nicely and we don’t see any problematic CPU usage.

6. Similarly to that, it may be a good idea to use (Sysinternals) Procmon to obtain thread/CPU info at the time of high CPU usage.

[A] Good tip, but sadly Procmon doesn’t really work since the system is so heavily loaded that it cannot capture anything. We’re doing it via Performance Monitor, which is a bit clumsy but sort of works.

Thank you

Attachments:

Please Log in or Create an account to join the conversation.

More
06 Jun 2016 06:17 #4083 by support
Replied by support on topic Performance issues
Dear Sirs,
thank you for your interest in our products.

No, this is not a known problem. In fact, even if there were issues with high CPU usage under some conditions, I would never expect them with such a small load on the system (a;ready with around 6 subscribed items, if I understand it well). My general feeling that it is going to be some weird interaction between the subsystem involved - such as the Delphi runtime + .NET CLR + Windows service. But I do not know what it is at the moment.

I would definitely suggest to obtain more information first. Below are some questions and suggestions (and things to confirm).

1. Is it true, as I wrote above, that the issue already happens with small number of subscrined items?

2. Do I understand it correctly that the high CPU usage occurs in two cases: a) when your clients connect at the same time when SubscribeMultipleMonitoredItems I being processed, and b) when any data change notification arrives ?

3. I have a slight suspicion that .NET garbage collector may have something with - or possibly not, but we should rule it out first. Please use PERFMON to observer the process, and its GC-related characteristics, and figure out whether there isn't a correlation between .NET GC collections, and the periods of the high CPU usage. The related counters are mainly under ".NET CLR Memory": # Gen n Collections, % Time in GC, # Bytes in all Heaps, #GC Handles etc. Other .NET counters in other groups may be interesting as well.

4. Thank you for the provided "log entries". Aren't there some more entries later, specifically near/during/short after at the periods of high CPU usage?

5. Have you tried to use the debugger to see which thread(s) are responsible for the high CPU usage, and where do they spend their time (call stack)? If not, can you do that?

6. Similarly to that, it may be a good idea to use (Sysinternals) Procmon to obtain thread/CPU info at the time of high CPU usage.

Thank you

Please Log in or Create an account to join the conversation.

More
03 Jun 2016 09:17 - 03 Jun 2016 10:21 #4078 by juanbernabeu
Hi there,

We are having a CPU usage problem with our component at one of our clients. We recently installed it there and we see the CPU load go to 100% for an extended period (several minutes or longer) before returning to normal levels (a few %) when subscribing to 6 items.
When one of the tags is written to, the CPU load goes to 100% again, for a long time (from 10s of seconds to several minutes). Due to the heavy load, our service starts having problems handling its proper work and coordinating the clients that are connected.

The server running the OPC client where the problem occurs has an Intel Xeon E5-2660 at 2.20 GHz (2 quad code CPUs).

Our setup is a bit complex:
We are using the OPC UA client via COM (as we are using Delphi XE3). The OPC client is created in a dynamically loaded plugin inside our own Windows service.
Our service also handles a bunch of our own clients, which connect over the network.
The OPC client connects to a KEPServerEX v5.19.467.0, which runs in a proxy mode (behind it is an OPC DA server).
The clients does nothing related to the OPC client, nor does the OPC client talk with the clients. We detect OPC tag changes, and in response to which other threads might notify various clients.

The problem seems to happen mainly when we call SubscribeMultipleMonitoredItems() and if at the same time there are clients connecting to our service.
When they connect, some processing and synchronisation between the clients occurs which can take a lot of CPU for a short period of time (depending on number of clients, this can be a few seconds).

We have done a lot of various tests:
We connected to a demo version of the KEPServerEX, which gives the same problems, but the duration of the CPU load is shorter (10s of seconds instead of minutes).
We connected to a demo version (UaAnsiCServer) from Unified Automation, with the same result as above.
We removed our plugin (and thus the OPC client), then the problem goes away.
We stopped our plugin from doing actions when updates are received (i.e. not take any actions, and do no notifications when a tag change is received), but the problem persists.
We blocked all clients from accessing our service, then there problem goes away.
If we allow a single client, we have a short burst of 100% CPU usage which is longer than usual, but not so dramatic as to cause problems.
If we allow two clients, the burst is a bit longer, but still manageable.
From three clients onwards, the load becomes much longer and starts causing problems.
The same things happens even if we just connect a single client at a time and wait for the CPU to come to rest before connecting the next one. At about 3 clients, the CPU load becomes problematic.
Even when we get 3 clients to connect and everything is settled down, the moment a OPC tag is updated, the CPU goes to 100% for a long time.
We tried to delay the call to SubscribeMultipleMonitoredItems(), so that clients have a chance to connect before it happens. This had an good impact, the CPU load is acceptable when the clients connect and when SubscribeMultipleMonitoredItems() is called.
However, if clients connect after SubscribeMultipleMonitoredItems(), we see that the CPU usage goes to 100% for an extended period of time.
Also, if a single OPC value is changed, the CPU usage goes to 100% for quite a while before going down.

We have the impression that the more clients we have connected, the longer the CPU load will stay at 100%.

We can sort of reproduce parts of the problem at our office, namely the high CPU usage if clients connect whilst the OPC client tries to subscribe to events.
However, locally it's much shorter in period and we do not see the high CPU load when OPC tags are updated.

Is this a known problem or do you have any ideas what we can do to debug/fix this issue?

We have enabled the OPC client logging, but it does not really say much:
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'Process' -> Is64BitProcess: False, CurrentDirectory: "C:\Windows\system32", Id: 3976.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'Clr' -> Version: '4.0.30319.42000', PtrSize: 4.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'AppDomain' -> Id: 1.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'Debugger' -> IsAttached: False.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'AssemblyProperties' -> FullName: "OpcLabs.BaseLib, Version=5.35.1173.1, Culture=neutral, PublicKeyToken=6faddca41dacb409", Location: "C:\Program Files (x86)\OPC Labs\QuickOPC 5.3\Assemblies\OpcLabs.BaseLib.dll", GlobalAssemblyCache: False.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'User' -> Name: "SYSTEM", DomainName: "HAVEN", Interactive: False.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'AssemblyAttributes' -> AssemblyConfiguration: "Release".
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'OSConfiguration' -> MachineName: "HSVISION2", SystemPageSize: 4096.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'OperatingSystem' -> VersionString: "Microsoft Windows NT 6.2.9200.0".
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'Environment' -> Is64BitOperatingSystem: True, SystemDirectory: "C:\Windows\system32".
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine application info 'Computer' -> ProcessorCount: 8.
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA engine is connecting to the underlying subsystems.
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA engine SDK configuration successfully provided by "OpcLabs.EasyOpc.UA.Toolkit.InternalSdkConfigurationProvider".
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA engine has determined the client certificate parameters as listed below.<010>SubjectName: CN=SightVision Context Server service, DC=HSVision2<010>ApplicationName: SightVision Context Server service<010>ApplicationUri: urn:HSVision2:ContextServer:3.1.6.1<010>ProductUri: urn:literal:string:SightVision
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA engine has successfully connected to the underlying subsystems.
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA client session with connect sequence number 1 is connecting to endpoint URL "opc.tcp://192.168.27.11:49320".
            2016-05-27 15:28:33  OpcUaClient: Information: OPC-UA endpoint selection on discovery URL "opc.tcp://192.168.27.11:49320" candidate report (3 entries) follows.<013><010>[0:Winner(1000+0), 1:EligibleNotAccepted(80), 2:EligibleNotAccepted(90)]. OpcUaClient: Warning: The OPC-UA server endpoint URL "opc.tcp://192.168.27.11:49320" with security policy URI "http://opcfoundation.org/UA/SecurityPolicy#None" and transport policy URI "http://opcfoundation.org/UA-Profile/Transport/uatcp-uasc-uabinary", selected using discovery on endpoint URL "opc.tcp://192.168.27.11:49320", has zero security level. Using this endpoint is not recommended and is only supported for backward compatibility. This happens when no other endpoints are available and eligible for selection.
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA client session with connect sequence number 1 has selected an OPC-UA endpoint for discovery URL "opc.tcp://192.168.27.11:49320" with following parameters:<013><010>Endpoint URL "opc.tcp://192.168.27.11:49320", message security mode 'None', security policy URI "http://opcfoundation.org/UA/SecurityPolicy#None", transport profile URI "http://opcfoundation.org/UA-Profile/Transport/uatcp-uasc-uabinary".
            2016-05-27 15:28:33  OpcUaClient: Information: The OPC-UA client session with connect sequence number 1 user token selection on endpoint URL "opc.tcp://192.168.27.11:49320" candidate report (2 entries) follows.<010>[0:Winner(""), 1:ValidNotEligible(NotApplicable("http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0"))].
            2016-05-27 15:28:34  OpcUaClient: Warning: The server certificate failed validation with error code 0x801A0000 while establishing OPC-UA client session to endpoint URL "opc.tcp://192.168.27.11:49320". <013><010>  BadCertificateUntrusted 'Certificate is not trusted.<013><010>SubjectName: CN=KEPServerEX/UA Server, O=Microsoft, C=BE, DC=WIN-89NQ4ODUUES<013><010>IssuerName: CN=KEPServerEX/UA Server, O=Microsoft, C=BE, DC=WIN-89NQ4ODUUES' OpcUaClient: SuccessAudit: The server certificate has been accepted for the OPC-UA client session to endpoint URL "opc.tcp://192.168.27.11:49320" by a fallback method.<013><010>  Certificate information: [Subject]<013><010>  CN=KEPServerEX/UA Server, O=Microsoft, C=BE, DC=WIN-89NQ4ODUUES<013><010><013><010>[Issuer]<013><010>  CN=KEPServerEX/UA Server, O=Microsoft, C=BE, DC=WIN-89NQ4ODUUES<013><010><013><010>[Serial Number]<013><010>  9A600804<013><010><013><010>[Not Before]<013><010>  3/05/2016 16:59:48<013><010><013><010>[Not After]<013><010>  1/05/2026 16:59:48<013><010><013><010>[Thumbprint]<013><010>  1E6430DEED18E4C68CA2204118903465A41F3ECE
            2016-05-27 15:28:34  OpcUaClient: Information: The OPC-UA server on endpoint URL "opc.tcp://192.168.27.11:49320" returned an empty list of software certificates.
            2016-05-27 15:28:34  OpcUaClient: Warning: The OPC-UA server on endpoint URL "opc.tcp://192.168.27.11:49320" did not return a valid software certificate for product URI "urn:BCCZB-INT-CAM-1:Kepware.KEPServerEX.V5:UA Server" listed in server endpoints. The server returned 0 software certificate(s) in total.
            2016-05-27 15:28:34  OpcUaClient: Warning: The OPC-UA server on endpoint URL "opc.tcp://192.168.27.11:49320" returned an empty server signature.
            2016-05-27 15:28:34  OpcUaClient: Information: The OPC-UA client session with connect sequence number 1 has successfully connected to endpoint URL "opc.tcp://192.168.27.11:49320". The server-assigned session Id is "ns=1;g=b66f2fa0-cf14-468d-9fbc-1a3815f13374".
            2016-05-27 15:28:34  OpcUaClient: Information: When creating or modifying a monitored item, the OPC-UA server on endpoint URL "opc.tcp://192.168.27.11:49320" has revised the sampling interval to 5000 milliseconds; the requested sampling interval was 15000 milliseconds. Further such warnings on this session will not be logged.

We have also externalised the OPC functionality to a different process (using the same COM component) and thus communicate with our main application through RPC and that doesn't cause CPU issues. But as soon as we get your component within our main application (both on main and separated thread) the CPU goes high during the initialization and also during the write/read tags

Best regards
Last edit: 03 Jun 2016 10:21 by support. Reason: formatted log output

Please Log in or Create an account to join the conversation.

Moderators: support
Time to create page: 0.316 seconds

      

 Recommend this on Google