EA stalls & unwanted start() re-entrys

 
MT4 Build 509

I have been struggling to get a handle on a problem with an EA. The problem presents as intermittent stalls in the running of the EA.

Software Overview: I have an EA running in MT4 and a C# program running as an independent process. The EA sends short messages (~60 bytes) via Windows mailslot to the C# program. There is no communication in the other direction. The EA does not currently execute any trade orders. The per-tick processing is handled by a small, simple, straightforward state machine. Once every few minutes, custom calculations are performed before “clocking” the state machine. The calculations are simple and are based on past price data from a circular buffer of ~1200 entries maintained by the EA. Perhaps several times a day, the state machine detects an entry or exit condition. This triggers a mailslot message send to the C# program. (If the send fails, ticks are skipped & the send is retried every 10 seconds.) While between an entry and exit condition, the EA sends an update message to the C# program once every 20 seconds. ( If the send fails, the update message send is not retried.)

Symptoms: (Running with 20 charts each with the EA attached) Intermittent failures on some charts while other charts appear to continue to operate correctly. Failures include missed, delayed, or incorrect entry/exit messages and the cessation of update messages. The “outage” on a particular chart can be several hours in duration. Eventually, the outage ends and the chart/EA appears to start acting correctly.

What I’ve tried: Logging of calculation results to verify correctness. Review of circular buffer handling code to check for buffer overruns. Comparison of logs of EA message sends and C# message receives for discrepancies (though I have not caught an outage in these logs). Tracking and logging of operating characteristics such as inter-tick time intervals and start() wall clock execution time. Regarding start() wall clock execution time, it is almost always < 1 second, but occasionally shows as 5-10 seconds. I’m guessing that the longer times might be attributed to being swapped out by the OS. From the resolution and the shorter times, I crudely estimate that the typical execution time for my start() is in the vicinity of 1 millisecond.

In my desperation and paranoia, I have been considering concurrency issues in MT4. My start() is definitely not re-entrant. I have added the following code to the beginning of my start() implementation:

if ( ! GlobalVariableSetOnCondition( "Start" + Symbol(), 10, 0 ) )
{
  // re-entry
  reEntryCnt++;
  if ( reEntryCnt == 1 )
  {
    Alert( "Re-Entry detected for ", Symbol() );
  }
  return ( 0 );
}

along with a GlobalVariableSet in init() and logging of the value of reEntryCnt to a file in deinit(). Most of my log files show a reEntryCnt value of 0, BUT a few show hundreds to tens of thousands of re-entry attempts for what I understand to be a “should never occur”. I have found no indication of such a bug on the web & do not find it credible that such a critical seeming bug could exist and be unknown is such a widely distributed and used piece of software. How am I being delusional here and what am I missing?

To add insult to injury, my crude re-entry guard does not solve my EA stall problem. I have currently run dry on ideas of places to look and things to try. Any suggestions?
 

Welcome to mql4.com forum,

Please use the SRC button when you post code. Thank you.


This time, I edited it for you.

 
avi-sands:
[...] Any suggestions?

This is difficult because you may or may not have discovered re-entrancy under something like extreme load, and that may or may not be your fundamental problem. I don't think that there's enough information to work with, and you need to show more of the code.

Taking just one example, if you have added the re-entrancy check in a hurry because you are desperate to fix the bug, it would be understandable if you have overlooked a code path where start() can exit without resetting the global variable. Or there could be a rare set of circumstances where you end up doing something like division by zero, causing MT4 to bail out of that call to start() without resetting the global variable. (If this happening, then you might also not be closing the Win32 mailslot handle, which could lead to problems in the long term.)

The only actual clue is your mention of "incorrect" levels. That sounds as though you might sometimes be doing a lengthy operation in start() before evaluating the price data, and the data then examined by the EA is stale and needs use of RefreshRates().

 

gchrmt4:

[...] you need to show more of the code.

[...] if you have added the re-entrancy check in a hurry because you are desperate to fix the bug, it would be understandable if you have overlooked a code path where start() can exit without resetting the global variable.


(Head banging on wall.) I should know better, but that is exactly what I did. I have 2 early exits from start(). One is when I have to retry a failed mailslot message send. The other is my deferred initialization check. (Since I have to go back in the price history, I need to allow the platform time to download the data. Thus I have a “if not initialized and can’t initialize yet, then return without doing anything”)

So it seems that re-entrancy was a red herring and my fundamental problem remains.

The only lengthy operation in start() is my deferred initialization and once that is successful, it is not run again. That initialization does have to run through 4300 periods of past price data.

It is having failed so far in getting an actual clue to localize the problem that has gotten me frustrated, desperate, and unable to come up with a limited code extract containing the problem. The total code base is only about 1200 lines, fairly straightforward, and doesn’t contain any startling algorithms, but I’m not free to release it. I’ll have to think about trimming off pieces and see how that affects behavior, but there isn't that much to begin with to trim off of.

Reason: