Tentative Initialization / Dispatch Performance Enhancement (Oct 2012)
Developed during analysis of Gnats 5222 (slowdown of subsequent runs).
Phil Weinstein / CADSWES / 10-23-2012
A recent fast running bug model -- Gnats 5231 -- demonstrated an interactive-run performance degradation behavior similar to that reported in another recent bug -- Gnats 5222 -- which actually had been difficult to reliably reproduce, even by the original user reporting the problem.
If fact, it's likely that the similar performance degradation of the two models have different causes. "5222" wasn't reproducible by simply re-running the model (in the same RiverWare session, as originally reported) -- operating some dialogs in between runs was apparently necessary to cause subsequent runs to take more time. However, with "5231" (correction), the second run was reliably significantly slower, in particular, on "64-bit":
Using Quantify (available only on 32-bit), the difference between the first and subsequent runs was not detected -- each of the subsequent runs take about the same amount of time. However, Quantify was useful in confirming manual "sampling" of the long delays found by randomly stopping a 64-bit run in the debugger -- which pointed to massive dynamic memory allocation processing for very small pieces of data. This was in and "around" our "SlotSet" bit-array class used in the Simulation Object dispatch mechanism.
Recoding the simple SlotSet class, and the Simulation Object's Series of SlotSets to use static allocation in place of dynamic allocation resulted in PROFOUND performance improvements with this model on 64-bit -- especially for "subsequent" runs.
These results are presented as RiverWare plots (prepared manually). Sequences of a dozen runs within single RiverWare sessions were performed with and without these enhancements -- each test done twice. In the pairs of traces shown in the following plots (run time is along the vertical axis, successive runs are arranged horizontally), the higher value traces are the baseline times; the lower value traces are with the code enhancements described below.
UNFORTUNATELY, most of the other models with which I have tested these changes do NOT show a performance benefit. For example, the original 5222 bug model shows only only a minor benefit, well within the variability of the results.
The changes are checked in to a branch in GIT ... see the following commit link:
https://cadswes2.colorado.edu/internal/cgi-bin/gitweb/gitweb.pl/builds.git/commit/34f93a5bc8b62952ed17885fb9dbb1833fa8c64d
The following commit notes has some additional information about the context of this performance enhancement work, including a characterization of the model for which these changes show profound benefits.
Initialization / Dispatch Performance Optimization (Tentative) Bug Number: 5222 (with the Gnats 5231 model) Release notes (y/n): No For Release Nums: 6.3 A PROFOUND performance enhancement was realized with the Gnats 5231 Simulation Model (204 objects; only 610 timesteps, but with a FEW slots having 29,000+ presimulation timesteps). SEE performance statistics (shown in a RiverWare plot, manually prepared): http://cadswes2.colorado.edu/~philw/2012/bugs/5222/PerformEnhance-2012-oct-22.gif HOWEVER in testing so far (with just a few models), a performance benefit is unclear in OTHER models with these software changes. So, whether we will consider using (keeping) these changes is still under review. The changes are related to moving from DYNAMIC ALLOCATION to STATIC ALLOCATION of bitmaps used primarily within the SimObj dispatching mechanism. Note that the MANY presimulation timesteps in just a few slots in the 5231 model ARE represented in the SimObjs' "_currentSet" SlotSet series. This is apparent from this debug trace: http://cadswes2.colorado.edu/~philw/2012/bugs/5222/initCurSet_bug5231.txt The code changes are these, detailed below: (1) SlotSet (bit array) has been changed from a dynamic array to a fixed- length static array. (2) SimObj's Series of SlotSet pointers (to dynamically allocated SlotSets) has been changed to a Series of SlotSet values. (3) DispatchEntry's two SlotSet pointers (to dynamically allocated SlotSets) has been changed to direct SlotSet members. --------------- Sim/SlotSet.hpp Sim/SlotSet.cpp --------------- Although SlotSet could be used as a general bit-array, it has provisions specifically for the support of Series Slot's (and Series Slot Proxy's) Dispatch IDs (indeces). It is used in two modules: (1) In SimObj to identify a set of Dispatch Slots on the Object. (2) In SimObjMultiSlot to identify a set of SubSlots having a known value (at a particular timestep). This class has be rewritten to no longer use our VectorSet class which dynamically allocates the required number of "words" for a variable length bit array. The SlotSet class is used primarily for a predictably-sized bit array dependent on the C++ implementation of the Engineering Objects -- specifically the number of Dependent Slots in each engineering object type. At this time, this upper limit is 81, as depicted in this test output: http://cadswes2.colorado.edu/~philw/2012/bugs/5222/SlotSetExceedErrorTest.gif http://cadswes2.colorado.edu/~philw/2012/bugs/5222/SlotSetExceedErrorTest.txt HOWEVER, the capactity of SlotSet required by SimObjMultiSlot is a function of the model's configuration rather than internal RiverWare design. SimObjMultiSlot uses SlotSet to represent the known values among its subslots. As currently implemented an assertion will fail if a SimObjMultiSlot has more than 96 subslots (links to other slots). SlotSet is now coded for a fixed length array of three (3) 32-bit "words" to support up to 96 bits. It uses these internal definitions and fields: enum { BITS_PER_WORD = 32 }; // number of bits in word enum { WORD_CNT = 3 }; // max number of 32-bit words enum { BIT_CNT = 96 }; // max number of bits quint32 _words [WORD_CNT]; The constructor no longer includes an integer parameter for the required number of bits. Instead, the clients of this class (currently SimObj and SimObjMultiSlot) confirm their use of SlotSet with a call to this static method. (The "context" objects are used only for RiverWare Internal Error generation): static void SlotSet::confirmSlotCapacity (int cnt, SimObj* contextObj, Slot* contextSlot=NULL); A capacity problem should arise only with coding enhancements to the Engineering Objects (i.e. the number of Dependent Slots exceeding 96) OR with a model having a SimObjMultiSlot having more than 96 subslots. (That would be for LINKS from other slots). If we are concerned about this limit as applied to SimObjMultiSlots, the original SlotSet implementation could be re-introduced as a distinct class, e.g. DynamicSlotSet. There was also some API cleanup to address ambiguity of method names, e.g. 'isSubSet' was changed to 'isSuperSetOf'. -------------- Sim/SimObj.hpp Sim/SimObj.cpp -------------- SimObj's Series of SlotSet pointers (to dynamically allocated SlotSets) has been changed to a Series of SlotSet values. This Series represents the set of known Dispatch slots at each timestep: OLD: Series<SlotSet*>* _currentSet; NEW: Series<SlotSet>* _currentSet; References to _currentSet elements within the SimObj implementation were changed from pointer-references to mutable C++ references (&). There are a couple initialization iterations which could be removed because the default SlotSet constructors take care of the Series element initialization. (Those initializations were happening before too, within the formerly dynamically allocated SlotSet instances). Similarly, there is no longer a need for deletion iterations. In places where SlotSet elements are created (when _currentSet is modified), calls to the SlotSet capacity-checking method are made. (See above). There was some recoding for more efficient handling of DispatchEntries, (see following). --------------------- Sim/DispatchEntry.hpp Sim/DispatchEntry.cpp --------------------- (1) Change of the two dynamically allocated SlotSet instances to direct SlotSet members. OLD: SlotSet *needed; OLD: SlotSet *solvedFor; NEW: SlotSet _needed; NEW: SlotSet _solvedFor; ... also applied our data member name convention to other fields (i.e. starting with an underscore). (2) Other code cleanup, e.g. eliminated redundant series timestep lookup and unnecessary ControlInfo structure copying in these accessors: unsigned char DispatchEntry::isActive (const Date_Time*) const void DispatchEntry::setActive (const Date_Time*) void DispatchEntry::deactivate (const Date_Time*) void DispatchEntry::setSimBlock(const Date_Time*, int block) int DispatchEntry::getSimBlock (const Date_Time*) ----------------------- Sim/SimObjMultiSlot.cpp ----------------------- Client-level changes for changes to the SlotSet class. (See notes above). However, SlotSet instances used within this class are still dynamically allocated. -------------------------- QtRpl/DispatchInfoTree.cpp -------------------------- Client-level changes for changes to the DipatchEntry class. (See notes above).
---