Loading...
 

Performance (InstantView)

High-performance InstantView

The performance of an application depends primarily on the performance of the infrastructure, but of course also on the program code of the application to be processed.

The InstantView® developer who is more interested in the pure application should be freed as far as possible from performance considerations. In this sense the CyberEnterprise® architecture realizes some concepts of automatic optimization:

  • Implicit start of a database transaction at the latest possible time
  • Arrangement of newly created objects in the database is controlled by the description of the database layout. Summary of logically strongly connected objects using the "Lazy Creator" concept
  • Write-Lock for an object only if "deviating" data is actually written to the object (this mechanism can, however, also lead to deadlocks)
  • Optimized window objects like ObjectListView.

As long as only a few objects (< 2000) are used in a processing step, optimizations in the InstantView® code are superfluous.

The opposite is (unfortunately) true as soon as many objects are displayed, tested or in any other way included in the algorithmic process.
Then simple changes can reduce times from over an hour to a few minutes.

The following InstantView® elements are examined more closely with regard to the time required:

1. vectors

First an example: With FindAll(CX_ITEM) you can find 100,000 objects of this class in a database. According to a criterion (here: data field uniqueID begins with 'A'), objects are selected and recorded in a vector for the following processing steps. In the test, exactly half of all objects fulfill the selection criterion. The structure of the vector is therefore decisive for the time required.

Var(vector)

Define(BadExample)    // schlechte Performance
  [] -> vector
  FindAll(CX_ITEM)
  iterate
  {
    LocalVar(item) -> item
    1 item Copy(uniqueID) Left "A" = if { item vector | -> vector }
  }
;



Define(MuchBetterExample)   // deutlich besser
  [] -> vector
  FindAll(CX_ITEM)
  iterate
  {
    LocalVar(item) -> item
    1 item Copy(uniqueID) Left "A" = if { item vector Insert }
  }
;

Both code examples lead to the desired result. However, there are significant differences with regard to the time required:

 perform1.png

Why is that?

In the first example, a new vector is created for each selected object - i.e. 50 000 times.

The instructions

element vector1 | -> vector2

are useful if both vector1 and vector2 are still needed. But

element vector | -> vector

should always, when many elements are involved, be replaced by

element vector Insert

be replaced.

This is the case in the second example, and the time required falls to below 1/40, the main reason being that in case 1, all the 49 999 superfluous vectors must be cleared from the garbage collection.

If the number of elements in increases to many millions, then the internal reallocation of the vector can also be measured. In this case the vector can be pre-allocated in the target size as an additional optimization (CreateVector).

2. ObjectListView, ObjectList and ObjectCombobox

Fill list box

Define(BadFill)   FindAll(CX_ITEM) iterate   {     LocalVar(item) -> item     1 item Copy(uniqueID) Left "A" = if { item FillObox(, obx) } Index 2000 < ifnot break   } ; Define(GoodFill)   LocalVar(vector) [] -> vector   FindAll(CX_ITEM) iterate   {     LocalVar(item) -> item     1 item Copy(uniqueID) Left "A" = if { item vector Insert } Index 2000 < ifnot break   } vector FillObox(, obx) ;

As in the previous example (vectors), objects are selected according to a criterion. The selected objects should be displayed in table form (the window object "obx" is ObjectListView, ObjectList or ObjectCombobox). If there are 2,000 objects (1,000 of them fulfill the given criterion) is it worthwhile to collect the selected objects in a vector before?

Here is the time in seconds when two columns are displayed:

 perform2.png
As you can see in the comparison, the ObjectListView is optimized for the representation of many objects and should be used preferably for the representation of large data sets.

The great time saving when inserting vectors is due to the fact that the widgets have to draw themselves after each FillObox, so that inserting individual objects into a list will always be slower than inserting an entire collection/vector, because in the latter case the list only has to be drawn once. For the ObjectList and ObjectCombobox the difference is even bigger, because here the column widths have to be recalculated after each insert based on all entries (flag: AUTO_POSITION), whereas in the ObjectListView this is only done based on the currently visible elements.
With sorted lists, this effect is all the greater because a FillObox with individual elements inserts sorted in this case. Sorted insertion is efficient for single elements, but if there are many elements, it is faster to insert the elements into the list unsorted and then sort the whole list once. The last approach is automatically chosen by FillObox when a collection/vector is filled by items.
The difference in speed is illustrated in the following diagram for the ObjectListView using 10,000 objects:

 perform2.png

Note: The same considerations apply when filling in objects with UpdateObox (to avoid double displays)

Conclusion

1. if no features realized only for ObjectList are needed, always use ObjectListView

2. collect many objects in vector or transient collection and call only 1 x FillObox - this is especially true for ObjectList and ObjectCombobox with flag AUTO_POSITION

3. collections - set or list

If many objects are collected in a collection, the decision between LIST or SET has a large influence on performance:

Characteristics of the application which collection is preferable
Objects should be "collected", there are no tests for the existence of an object (not even implicitly ... see below).
Iteration order should correspond to the insertion order.
LIST
There are existence tests, e.g. also implicitly, if you simply want to prevent the same object from appearing more than once in the collection. The iteration sequence does not matter. SET
If duplicates are allowed, the order does not matter, but should be quickly checked for existence BAG

The following diagram shows the time required to insert n objects into the different collection types. This clearly shows that LIST is best suited for large amounts of data and that the choice of collection has little effect on performance for smaller data sets.

 perform3a.png

The following section deals with collections of type SET:

Test for the existence of an object

The test whether an object is contained as an element in a set of objects can be relevant for the time response.

Sets of persistent and/or transient objects are formed by

  • (persistent and transient) Collections
  • Vectors (always transient)

Collections exist as array, list, set and bag.
The InstantView® statement Contains provides information about whether an object is an element of a collection/vector.

Time needed for this existence test with a vector / collection with n elements:

Time requirement as a function of n is proportional to
Vector n
Collection (array) n
Collection (list) n
Collection (set) log n
Collection(bag) log n

If the set of objects is a vector or a collection of the type array or list, it is worth copying the set of objects to a transient collection of the type set and testing it against this copy if the number of tests is sufficiently large.

What is meant by a "sufficiently large" number?
The following example searches a collection with 10,000 or 100,000 elements (=ELEMENT_COUNT). The following code (TEST_COUNT) searches for random elements in the collection and measures how long the check for the existence of these elements takes in total.

Contains mit LIST
Var(ELEMENT_COUNT) 10000 -> ELEMENT_COUNT Var(TEST_COUNT) 10000 -> TEST_COUNT Var(coll, vecCopy) [] -> vecCopy CreateTransCollection(LIST) -> coll // Create our collection to test and a vector copy FindAll(CX_PERSON) iterate { Index ELEMENT_COUNT < ifnot { Drop break } Dup coll Insert vecCopy Insert } // Now select TEST_COUNT random elements form vecCopy and insert them into testVec Var(testVec) [] -> testVec do Index TEST_COUNT < ifnot break 0 ELEMENT_COUNT 1 - GetManager(TEST) Call(GenerateInt) vecCopy GetElement testVec Insert loop Var(t1,t2) CreateTransObject(CX_TIME) -> t1 // Now perform TEST_COUNT existence checks with the randomly selected elements from testVec testVec iterate { coll Contains Drop } CreateTransObject(CX_TIME) -> t2 "s" t2 t1 - Call(Convert) Attention(,INFO)

In contrast, the following code first copies the entire list into a new SET and then performs the existence check on the copy.

Contains mit SET-Kopie
Var(ELEMENT_COUNT) 10000 -> ELEMENT_COUNT Var(TEST_COUNT) 10000 -> TEST_COUNT Var(coll, vecCopy) [] -> vecCopy CreateTransCollection(LIST) -> coll // Create our collection to test and a vector copy FindAll(CX_PERSON) iterate { Index ELEMENT_COUNT < ifnot { Drop break } Dup coll Insert vecCopy Insert } // Now select TEST_COUNT random elements form vecCopy and insert them into testVec Var(testVec) [] -> testVec do Index TEST_COUNT < ifnot break 0 ELEMENT_COUNT 1 - GetManager(TEST) Call(GenerateInt) vecCopy GetElement testVec Insert loop Var(t1,t2) CreateTransObject(CX_TIME) -> t1 // Copy the list into a SET first Var(set) CreateTransCollection(SET) -> set set coll += // Now perform TEST_COUNT existence checks with the randomly selected elements from testVec testVec iterate { set Contains Drop } CreateTransObject(CX_TIME) -> t2 "s" t2 t1 - Call(Convert) Attention(,INFO)

With 10,000 elements in the collection and variable number of existence checks:
 perform3b
and with 100,000 items in the collection:

 perform3c

The number of tests must be large enough to make the additional effort of copying the collection worthwhile. The positive effect of the set only becomes apparent when the number of tests is at least 10% of the number of items in the collection. In the following graph, the duration of the copy is plotted against the number of elements in the collection. You can see that the duration for the set copy grows at least linearly and that our test with 100,000 elements was occupied more than 90% of the time only with the set copy.

Shot follow-up:

The approach to copy a list into a set in order to be able to check for existence faster is therefore only worthwhile in special cases with an extremely large number of queries.

4. loading routines

The name InstantView® refers to visualization of data, not necessarily to batch runs.
If loading routines are written with InstantView®, the points 1. and 3. can be particularly relevant.
Frequently executed queries should be accelerated by database indexes.

Loading routines can be additionally accelerated by the following measures:

Accelerate CreatePersObject (deadlock prevention)

To eliminate deadlocks between two clients running the same code, CreatePersObject and TriggeredStateMonitor switch the lock mode of the database to (WRITE,PAGE) before they read the root entry point collection (CreatePersObject) or the start state (TriggeredStateMonitor) and when the operation is complete, the lock mode is reset. This change can be prevented by completely disabling the DeadlockPrevention mechanism for the load run, or by executing a BeginLock(WRITE) once at the beginning of the load run. In both cases, the change of locking mode is then dropped, making CreatePersObject run much faster.

Below the test code and the measurement results for 100.000x CreatePersObject (each measurement was performed in an empty database).

CreatePersObject DeadlockPrevention-Testcode
//FALSE GetManager(OBJECT) Call(EnableDeadlockPrevention) //BeginLock(WRITE) Var(OBJECT_COUNT) 100000 -> OBJECT_COUNT Var(t1,t2) CreateTransObject(CX_TIME) -> t1 do { Index OBJECT_COUNT < ifnot break CreatePersObject(CX_PERSON) Drop } loop EndTXN CreateTransObject(CX_TIME) -> t2 "s" t2 t1 - Call(Convert) Attention(,INFO)

 perform5

As you can see, the deadlock prevention mechanism makes CreatePersObject run 7x slower and in loading runs there can be a measurable difference if the write-lock is set only once or not at all. The time difference between completely disabling the mechanism and the one-time write-lock before loading is marginal, and the approach with BeginLock(WRITE) should generally be preferred, since it resets the lock mode after loading and does not disable the deadlock protection.

In normal operation, however, it makes no sense to set a BeginLock(WRITE) before each CreatePersObject for performance reasons, because the execution of a single CreatePersObject is still in the µs range despite the locking brake and the potential performance gain would not be measurable for a user.

This difference is only noticeable in loading runs that create massively persistent objects. The test is also not a representative example of a load run, since the generated object is not filled with data here. In a real application example, the measured time difference over the entire load run was 25%.

Log off Transient Collections

If CreateTransObject is used to create a large number of objects that are supposed to exist over a longer period of time (and are therefore e.g. kept in a vector), the garbage collection of the ClassiX® system can become extremely slow, because it goes through all variables of all modules at regular intervals and checks which objects are accessible via these variables in order to delete the unreachable objects. For this purpose, vectors and transient collections must also be searched.

In this case, it is useful to create the objects with CreateTransObject(..., KEEP), so that the objects cannot be deleted from the garbage collection. If the objects are held in a vector/collection, then this vector/collection can be marked for the garbage collection using UnprotectContents so that the garbage collection does not check the content of the vector/collection and the number of elements therefore has no influence on the performance of the garbage collection.

In order to clean up the temporary objects, the flag in the vector/collection should be removed again after the run via ProtectContents and the elements created via CreateTransObject(..., KEEP) should be registered with the garbage collection via register.

Transaction splitting

All changes to the persistent storage must be recorded in a transaction log for the duration of a transaction so that the changes can be written to the database or rolled back at the end of the transaction. For larger load runs, this transaction log may grow rapidly and the management of the changes may take a measurable amount of time. If this suspicion arises, you can try to distribute the run over several small transactions so that the log does not grow too large.

Small transactions are especially important if many changes are made in the persistent area during live operation, because then all pages that have been changed are locked for all other clients for the duration of the transaction and they have to wait for the end of the transaction.

However, transaction splitting should be used with caution for pure load runs and only if the long transaction has been identified as a possible performance problem. The following measurements show that in many cases transaction splitting has no effect or even significantly worsens performance, so no recommendation can be made here for a specific number of objects per transaction.

For the test, the duration for creating and filling 1,000,000 persistent objects in an empty database was measured depending on the selected number of objects per transaction with the following code.

FALSE GetManager(OBJECT) Call(EnableDeadlockPrevention) Var(OBJECT_COUNT) 1000000 -> OBJECT_COUNT Var(OBJECTS_PER_TXN) 100000 -> OBJECTS_PER_TXN Var(t1,t2) CreateTransObject(CX_TIME) -> t1 do { Index OBJECT_COUNT < ifnot break Index OBJECTS_PER_TXN Mod ifnot { EndTXN BeginTXN } // restart txn LocalVar(firstName, name, person) CreatePersObject(CX_PERSON) -> person 3 10 GetManager(TEST) Call(GenerateName) -> firstName 3 15 GetManager(TEST) Call(GenerateName) -> name firstName person Put(firstName) name person Put(name) // combine name and index into uniqueId name Index + person Put(uniqueID) } loop EndTXN CreateTransObject(CX_TIME) -> t2 "s" t2 t1 - Call(Convert) Attention(,INFO)

As a result, the number of objects per transaction is plotted against the total duration of the run and it can be seen that there is no significant difference between one transaction and 10 transactions. From 20 transactions on, the run becomes slower and slower with the number of transactions, because each transaction also has a certain overhead.

 perform3c

Efficient import of Excel files

Excel files (.xlsx) are a frequently used (though not well suited) data exchange format, since most programs offer an Excel interface for import/export. The exported Excel files can be several hundred megabytes in size and because Excel is a packed format, the unpacked data volumes can be much larger. In read mode, the contained .xml files are unpacked by the CX_EXCEL_XML class into the TEMP directory and read from there with a SAX parser. The advantage of this method is that, theoretically, files of any size can be read in, since they do not have to be loaded into memory.

Note: Practically the size is limited by the currently used ZIP library to .xlsx files whose unpacked contents are < 2GB in size.

Internally, our Excel object has 5 SAX scanners per Excel worksheet, which can only move forward. If a cell is requested via GetValue, the scanner closest to this cell is moved forward to the desired position. If all scanners are behind the queried cell, then the scanner that is furthest away is moved back to the beginning of the worksheet. This procedure ensures that large worksheets can be read in very efficiently as long as the data is read in from top to bottom and there is no excessive jumping within the columns.

The following example shows the performance difference between correct reading direction, reversed reading direction and correct reading direction with jumps within the cells. The tested .xlsx file consists of a worksheet with 2000 rows x 6 columns filled with numbers in ascending order.

Leseperformance Test-Code
Var(excel) CreateTransObject(CX_EXCEL_XML) -> excel "CX_ROOTDIR\\testdata.xlsx" excel Call(LoadFromFile) Var(rows, columns) 2000 -> rows [ 1 2 3 4 5 6 ] -> columns // forward column order //[ 1 6 2 5 3 4 ] -> columns // jumping column order //[ 6 5 4 3 2 1 ] -> columns // reverse column order Var(row, column) 0 -> row //rows -> row // for descending row order Var(t1,t2) CreateTransObject(CX_TIME) -> t1 do { Incr(row) columns iterate(BACKWARD) { row 1 excel Call(GetValue) Drop } row rows > ! //Decr(row) // for descending row order //row 0 > } while CreateTransObject(CX_TIME) -> t2 "s" t2 t1 - Call(Convert) Attention(,INFO)

Below are the results for the reading of the 14,000 cells:

 perform6

Here, Z+ stands for lines read in ascending order (Z- for descending), S+ for columns read in ascending order, Sx for columns read in jumps and S- for columns read in descending order.

As you can see from the measured values, the order in which data is read from an Excel file plays a significant role, even with a rather small file with 2000 x 7 cells. Because the Excel class 5 uses SAX scanners, it is possible to jump in the columns to a certain extent without any loss of performance during the import. However, if the sequence is disregarded, the file must be read through again and again from the beginning, which leads to a speed difference of factor 500 here. Of course, the factor increases the larger the file to be read.

Index maintenance with function UniqueIDIndexMaint

Index tracking for indexes using the fields (uniqueID and transaction) is very time-consuming when loading objects of class CX_TRANSACTION or derived classes. For this reason, you should deactivate indexes for the duration of the load process if possible, and reactivate them after the load process.

5. display objects with further, calculated data in an ObjectListView

Summary:

  1. Several columns need the same preparations to display their data.
  2. A macro carries out the preparations. These intermediate results are stored in variables. Of course, several macros can also perform different preparations, for example, if columns 1 and 2 require preparation A and columns 3 and 4 require preparation B.
  3. The return value of (2) is the object for which the preparations were made (in other words, the stack looks the same at the end of the macro as it did at the beginning). This way, the access expression of the column can be continued.
  4. The macros that deliver the final result for the column use the intermediate results from (2).
  5. The flag OPTIMIZE must be set for the ObjectListView so that the preparations are not carried out more than once and performance is not affected.

Description:

In addition to the individual fields of an object, other data can also be displayed in a ListView, which is calculated for each object individually, sometimes with the help of other objects. The access expression call() makes this possible. Let us assume that two columns refer to the same intermediate result:

Var(pCustomer) Define(Prepare) Dup Call(Customer) -> pCustomer ; Define(CustomerName) pCustomer Copy(name) ; Define(CustomerID) pCustomer Copy(uniqueID) ; [ "call(Prepare).call(CustomerName)" ] SetFormat [ "call(Prepare).call(CustomerID)" ] SetFormat

Both customer macros rely on the pCustomer variable being set correctly. It is dangerous to omit call(Prepare) from the second column, since the second column is only drawn after the first column. However, this is not the case when sorting! When the second column is sorted, the first column is not touched, and thus all variables in which intermediate results are stored are incorrect!

The Prepare macro returns the object for which it was called, so that the access expression can be continued.

This technique is still very inefficient, since pCustomer is determined anew for both columns. The following picture would appear in the profiler:
call(Prepare)
call(CustomerName)
call(Prepare)
call(CustomerID)

However, the flag OPTIMIZE of the ObjectListView causes the partial expression call(Prepare) to be executed only once for the entire line:
call(Prepare)
call(CustomerName)
call(CustomerID)

When sorting, call(Prepare) is called for each row and column individually, the OPTIMIZE flag has no effect here.

6. further performance pitfalls

Database mode not adapted to application

In regular operation of ClassiX, the wrong database mode (OpenDB, SetDBMode & BeginTXN) can cause bad performance. The choice of a database mode is always a trade-off between different aspects and no general recommendation can be given for all use cases. Here is a list of the underlying considerations with a recommendation for some application cases Incorrect database mode selection can lead to the following problems, among others:

  • The client cache is constantly discarded due to frequent changes of the database mode, which has a strong impact on read/write performance.
  • A client that performs read-only operations in "KEEP_UPDATE" mode will be blocked by other, writing clients, or will block them and must wait for the end of the transaction. This can also lead to a deadlock, which means that one of the two transactions must be aborted.

Locking mode not adapted to application

Even in locking mode, the wrong application can cause a performance problem. Write-Locks (BeginLock & SetDefaultLockMode) are mainly used to prevent deadlocks that would cause a transaction to terminate. The locks ensure that two clients that first read and then write access the same object do not experience deadlocks, because the first read access means that the second client must wait for the transaction to end before the object can be read by the second client. In multi-user operation, these waiting times can quickly become a performance problem. The wait times are particularly significant if:

  • a very pessimistic default locking mode is set for all clients (example: "WRITE" "DATABASE")
  • there are many long running transactions

Multiple Unlink on COLL- or REL-Slot

For the performance of reorganization runs, it can be helpful to know that when removing objects from slots of type COLL or REL_M1 or REL_MN, Unlink searches the underlying collection (a list) from back to front. If an object has a huge collection with several hundred thousand elements and you want to remove many thousands of elements from it, you should remove these elements in the reverse order in which they were added, so that Unlink has to check as few elements as possible in each step.