Benchmarking by Dannii Willis Search for other extensions by this author

Versions

Download version 1/130803For Inform 7 6G60

Links

Description

A general purpose benchmarking test framework that produces statistically significant results.

Tags performance testing

Documentation

Section: Introduction Benchmarking provides a general purpose benchmarking test framework which produces statistically significant results. Benchmarking refers to carefully timing how long some task takes to run. This extension has two types of users in mind: 1. Story and extension authors can use Benchmarking to compare alternatives for some slow programming task. The example below shows how you might use Benchmarking to compare alternative ways to match texts. 2. Interpreter authors can use Benchmarking to compare their interpreter with others, as well as to compare interpreter updates to see whether they have a performance benefit or deficit. The most accurate results will be obtained with a release build, as Inform's debug code will slow down some algorithms considerably, so be aware that simply using the Go! button will give different results than a release build would. (And if you want to run the tests with Inform's built-in interpreter on Windows you will need to install the 2012 6G60 re-release, as the original 6G60 release did not have all the necessarily functionality.) Benchmarking is based on the Javascript library Benchmark.js. http://benchmarkjs.com Benchmarking depends on Real-Time Delays by Erik Temple and Flexible Windows by Jon Ingold. The latest version of this extension can be found at <https://github.com/i7/extensions>. This extension is released under the Creative Commons Attribution licence. Bug reports, feature requests or questions should be made at <https://github.com/i7/extensions/issues>. Section: Writing test cases A test case should be added for each task or algorithm you wish to test. Each test case must be provided with a run phrase, which is what will be benchmarked. Unfortunately the Inform 7 syntax for attaching the run phrase is a little clunky. You must first give the phrase a name, and then attach it to the test case. My test case is a test case. To run my test case (this is running my test case): ... The run phrase of my test case is running my test case. If you are comparing algorithms for the same task it is important that they all do actually do the same thing. This extension does not and cannot compare whether test case algorithms are equivalent, so you should first test your algorithms thoroughly. If you are not comparing equivalent algorithms, use this option to prevent the final test comparisons: Use nonequivalent tests. It is also important that test cases run the same each time through, so if your test case changes the world state in some way you must reset what it changes as part of your run phrase. Test cases are a kind of thing, so like all things they can have descriptions. They can also be given an author, as shown in the example. Some test cases might require recent or optional interpreter features. If so then you can add an initialisation rule, in which you can check if that interpreter feature is supported, and disable the test case if not. To decide whether unicode is supported: (- (unicode_gestalt_ok) -). Rule for initialising my test case: unless unicode is supported: now my test case is disabled. Benchmarking is currently only designed for testing Glulx functionality, and it may not work well for testing Glk functionality. If you have potential Glk test cases please contact the author. Section: Change log Version 1/120610: Added a version action Added a nonequivalent tests use option The final results are now scaled Version 1/120218: Initial (non-beta) release Example: * Text matching - Avoiding slow Regular Expressions. *: "Text matching" Include Benchmarking by Dannii Willis. Search text is a text variable. Search text is "pineapple". Test text is a text variable. Test text is "apple banana grape orange pineapple starfruit". [ First we test what the standard rules give us. ] I7 default is a test case. The author of I7 default is "Graham Nelson". The description of I7 default is "The standard rules will use regular expressions to test if texts match, even though this is slow and inefficient." To run I7 default (this is running I7 default): if test text matches the text search text: do nothing. The run phrase of I7 default is running I7 default. [ Now check the texts directly, without using regular expressions.] To decide if (txb - indexed text) matches the text (ftxb - indexed text) without regex: (- check_for_matches({-pointer-to:txb}, {-pointer-to:ftxb}) -). Include (- [ check_for_matches text search textsize searchsize i j k; textsize = BlkValueExtent(text); searchsize = BlkValueExtent(search); for (i=0 : i<textsize - searchsize + 1 : i++) { k = 0; for (j=0 : j < searchsize: j++) { if (BlkValueRead(text, i+j) ~= BlkValueRead(search, j)) { k = 1; break; } } if (k == 0) { return 1; } } return 0; ]; -). Direct comparison is a test case. The author of Direct comparison is "Dannii Willis". The description of Direct comparison is "We can instead check directly whether the texts match." To run Direct comparison (this is running Direct comparison): if test text matches the text search text without regex: do nothing. The run phrase of Direct comparison is running Direct comparison.