In some cases, a merged report doesn't display the right information. We outline some worst case scenarios, and justify our implementation.
Suppose log file 1 (“requests” with “sizes”) looks like:
| request | size |
|---|---|
| A | 12 |
| B | 11 |
| C | 10 |
while log file 2 looks like:
| request | size |
|---|---|
| D | 3 |
| E | 2 |
| F | 1 |
We report on the top 2 biggest requests, so the report from log 1 looks like:
| request | size |
|---|---|
| A | 12 |
| B | 11 |
while the report from log 2 would look like:
| request | size |
|---|---|
| D | 3 |
| E | 2 |
Now we change the superservice.cfg file to list the top-4 biggest items. A naive merge would lead to:
| request | size |
|---|---|
| A | 12 |
| B | 11 |
| D | 3 |
| E | 2 |
Of course, this should've been:
| request | size |
|---|---|
| A | 12 |
| B | 11 |
| C | 10 |
| D | 3 |
This effect does not occur when keeping the top-limit to the same value. However, when we're not reporting on distinct values in the log, but are summing, more horrible things might happen. Consider this: We want to report on the total size by client. Logs look like:
| client | size |
|---|---|
| a | 12 |
| b | 11 |
| c | 10 |
and
| client | size |
|---|---|
| d | 4 |
| e | 4 |
| c | 3 |
Reports from these logs would look like:
| client | size |
|---|---|
| a | 12 |
| b | 11 |
| client | size |
|---|---|
| d | 4 |
| e | 4 |
After naively merging, one would get:
| client | size |
|---|---|
| a | 12 |
| b | 11 |
In fact, the complete report should look like:
| client | size |
|---|---|
| c | 13 |
| a | 12 |
Luckily, the Lire merging algorithm is not this naive: in fact, the XML reports store a little more records than actually needed. This heuristic trick leads to sane merged reports in most cases. However, since this is merely a heuristic trick, it is no waterproof guarantee.
See the description of the guess_extra_entries routine in the Lire::Group manpage for more implementation details.