Beijer Electronics (formerly QSI Corporation)

Manufacturer of Mobile Data and Human Machine Interface Terminals.
It is currently Mon Oct 23, 2017 1:52 pm

All times are UTC - 7 hours




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Mon Aug 25, 2014 11:34 am 
Offline
User avatar

Joined: Thu Jun 14, 2007 9:05 am
Posts: 98
Location: Montreal - PQ
All systems are identical. HMI A12 with Qlarity, PLC Wago

We have an issue with a HMI that will stop working about 3 times a week.

We have other HMI in the same place that will have this same issue but about once a month.

The symptom is that we get a modbus communication failure and its MB TCP.

The terminal is trying to reconnect but never does.

At this point the FTP Server in the HMI still works and we can PING it.

Using WireShark for a long time we have caught and have been able to repeat this situation on demand.

From what we see, a packet is lost and the communication will not reset. To reproduce the situation we just disconnect the ethernet cable while the HMI is exchanging data and once in a while when we do this the situation reappears.

Like the socket would not close or something.

If we reboot the HMI it starts to re-do the same thing over and over again.

To reset the communication we have to delay the startup of the MBTCP communication 15 seconds AFTER the HMI has rebooted. Then all works OK.

Attached is a WS log of the situation.

Our client has not the best cabling setup. In the shop its Cat5 cable tyrapped to pipes on the walls.
The HMI that has the greatest problems is the on with the longest cable... 100 feet
We have installed a switch between the PLC and the HMI because we think the packet loss is due to the cabling... BUT,
why would the communication not just reset itself when its failed that bad? A once in a while packet loss should be easy to recover from.

We are not communication specialists but then again...


Attachments:
File comment: Wireshark
error 11h51-38_trim.7z [11.57 KiB]
Downloaded 275 times

_________________
If it looks like a Cat ...
Top
 Profile  
 
PostPosted: Tue Aug 26, 2014 2:45 pm 
Offline
QSI Support
QSI Support
User avatar

Joined: Wed Mar 08, 2006 12:25 pm
Posts: 881
Location: Salt Lake City, Utah
I am going to take some guesses here

10.16.1.151 is the A12
10.16.1.152 is the controller (wago)

The funny thing that I see is that every five seconds or so the A12 initiates a TCP connection to the controller for Modbus. It makes a single request to read coil 4096 then closes the TCP connection.

This is not typical behavior for a Qlarity Modbus scenario. Normally Qlarity will leave the TCP connection open unless there was a communication error.

Does a communication error occur? Normally this will be displayed as a small banner at the top of the screen that will appear for a few seconds. Be sure that your ModbusComm object has the MessageFilter property set to _cc_showallmessages. (You don't want to suppress communication error messages or you won't know that they happened)

Now, I don't see anything in the Wireshark capture you sent to indicate that the A12 ever stopped talking to the controller. For the entire duration of the capture (roughly 7 1/3 minutes), the A12 successfully establishes a new TCP connection to the controller, and eventually shuts it down. If the capture lasted longer than than seven and a half minutes, then it is possible that the A12 lost network function entirely (possible hardware glitch or device driver error at a layer deeper than Qlarity).

_________________
Jeremy
http://www.beijerinc.com


Top
 Profile  
 
PostPosted: Wed Aug 27, 2014 9:07 am 
Offline
User avatar

Joined: Thu Jun 14, 2007 9:05 am
Posts: 98
Location: Montreal - PQ
If you look at time 11h50:47 this is where the communication error occurs. We get the red label stating TCPIP loss, retry connection.

It never does work again. Whe still can ping the HMI and acces it through FTP but the MBTCP is not working.

================================
One observation to add.

We program a button to open MBTCP com channel and another one to close it.

So when we start the HMI, all read data are seen with a value of zero or default values, which is what we expect. So we press the button and initiate the MBTCP conversation... and we see values changing.

Now if we reproduce the bug, we lose MBTCP com and cannot get it to work even if we press the close channel button.

IF we reboot the HMI the communication starts correctly ONLY because it is not programmed to connect at startup. It should not restart because we should need to press the start button to open that channel. But it does and works fine.

If we program the com to connect at startup, resetting or rebooting will never get the communication back... once it has failed, it takes hours of reboot to get thta MBTCP to connect

So the dirty solution to at least be able to reset that HMI is to add a 5 seconds delay to open the MBTCP channel AFTER startup... works EVERY time.


Attachments:
File comment: WireShark 2
error 11h50m47-998139.7z [16.84 KiB]
Downloaded 280 times

_________________
If it looks like a Cat ...
Top
 Profile  
 
PostPosted: Wed Aug 27, 2014 12:28 pm 
Offline
QSI Support
QSI Support
User avatar

Joined: Wed Mar 08, 2006 12:25 pm
Posts: 881
Location: Salt Lake City, Utah
I think I see where you mean that communication has problems, starting in packet number 1017 (for what it is worth, the time index is somewhat problematic as I am in a different time zone as you).

Aside from your button to start Modbus communication, do you have any code to disable the communication object or close the connection?

What exactly does the red banner say?

Does the red banner remain or does it disappear? Does it disappear and reappear?

_________________
Jeremy
http://www.beijerinc.com


Top
 Profile  
 
PostPosted: Wed Aug 27, 2014 1:16 pm 
Offline
User avatar

Joined: Thu Jun 14, 2007 9:05 am
Posts: 98
Location: Montreal - PQ
Yes we have 2 buttons. One to start and one to Close the communication.

They both work fine but the "Close" button does not work when the communication is failed.

The red banner says and it stays always there.

It says:

"TCPIP connection lost. Attempting to reconnect"



Jeremy wrote:
I think I see where you mean that communication has problems, starting in packet number 1017 (for what it is worth, the time index is somewhat problematic as I am in a different time zone as you).

Aside from your button to start Modbus communication, do you have any code to disable the communication object or close the connection?

What exactly does the red banner say?

Does the red banner remain or does it disappear? Does it disappear and reappear?

_________________
If it looks like a Cat ...


Top
 Profile  
 
PostPosted: Wed Aug 27, 2014 2:05 pm 
Offline
QSI Support
QSI Support
User avatar

Joined: Wed Mar 08, 2006 12:25 pm
Posts: 881
Location: Salt Lake City, Utah
Any chance that that particular message is from your code? I cannot find that anywhere in our code

_________________
Jeremy
http://www.beijerinc.com


Top
 Profile  
 
PostPosted: Fri Sep 05, 2014 7:36 am 
Offline
User avatar

Joined: Thu Jun 14, 2007 9:05 am
Posts: 98
Location: Montreal - PQ
Jeremy wrote:
Any chance that that particular message is from your code? I cannot find that anywhere in our code


Nope!

We have added a local switch to eliminate the distance factor from the ethernet cables.

But in the last week we got the bug again 4 times.

Attached is a picture of the problem as it appears.


Attachments:
photo1.jpg
photo1.jpg [ 206.03 KiB | Viewed 5874 times ]

_________________
If it looks like a Cat ...
Top
 Profile  
 
PostPosted: Wed Sep 10, 2014 10:36 am 
Offline
QSI Support
QSI Support
User avatar

Joined: Wed Mar 08, 2006 12:25 pm
Posts: 881
Location: Salt Lake City, Utah
Looking at your dumps, I don't really see anything wrong at the protocol level. There don't seem to be any dropped packets, and the connection to the controller appears to shut down cleanly then reopen cleanly as well.

It is almost like there is code in the Qlarity application that starts to repeatedly shut down the connection when some condition is met, and then the application gets into a cycle where it cannot quite get off the ground again.

One thing that does appear odd is that there are a very large number of coil writes going on. Is it really necessary to write dozens of coils every second?

_________________
Jeremy
http://www.beijerinc.com


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 7 hours


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group