Recently there's been more people writing new tests for the xfstests test suite and it seems to me that a few of the problems stem from a misunderstanding of how xfstests determines test pass/fail criteria.
Most unit test harnesses use a boolean value returned by the test to determine if the test passed or failed. This requires the test to not only set up, execute and tear down the test, but the test must do it's own results analysis as well. Results analysis tends to take up a lot more code and increase the test complexity quite a lot.
Where people are going wrong is that the xfstests suite does not just use a pass/fail boolean test result to determine the result of the test. Yes, it does indeed look at the return value of the test script as one method of determine the result, but it also has two other levels of testing for failure. The first is golden output matching, and the second is checking the filesystems for consistency. Checking the test filesystems for consistency after test is run is a pretty obvious function of a filesystem test suite, so that doesn't need any explaining. However, golden output matching is something completely different.
The concept behind golden output matching is that the test is going to output ѕome data as it runs. In your traditional test suite, the output is captured by the test itself, then analysed down to a pass/fail criteria. The xfstests suite does not need the test to do this. Instead, the test harness captures the stdout stream, and at the conclusion of the test compares the captured output to the golden image kept by the test suite.
That is, a test in xfstests is made up of two parts - the test script and executables, and the test golden image. The golden image is essentially a capture of a successful test run, and as such we can compare subsequent test runs to the golden image. If the output of the test run is different from the golden image, then something has gone wrong and the test is considered to have failed.
The part that makes this work is the test filters. Every test run will have things like slightly different numbers in the output, so if we include the numbers in the golden output then the test would fail if is wasn't a perfect match. Hence the filters replace parts of the output stream that are irrelevant to the test with known, fixed values. e.g. inode numbers get replaced with "INO", block sizes repalced with "BLKSZ" and so on. Filters are typically implemented with sed, awk, grep and perl, and are simply interposed into the stdout stream inside the test.
The result of using a golden image comparison is that we can use the output of the test commands directly to determine success or failure of the test without needing to do any analysis of the results in the test script itself. That is, we simply need to write the test to dump the necessary (filtered) information to stdout, and the test harness takes care of the rest of the "analysis" for us. This greatly simplifies the test scripts, as even complex tests no longer need any code to analyse the results.
As a result, just taking a standalone test or a test from a different test harness that do all their own results analysis to include it in xfstests is not really in the best interest of the xfstests maintainers. it's much easier to verify simple tests are doing the right thing, and spending time debugging tests rather than filesystem code is wasted time. Hence making effective use of stdout filtering and golden output matching when you port or write a new test for xfstests will make the reviewers much happier....