Reports Smell

ReportingFrameworks

If your application consists primarily of reports, this is a smell because:

  1. It's an indicator of incomplete or vague user needs. Essentially, a request for a report means "Gather a bunch of data and show it to me nicely formatted". Fine, but what exactly do all the pretty rows and columns tell you? Discover the true "why" for the report, and the need for the report evaporates.
  1. Reporting tools often cast domain objects in the role of dumb data containers. Instead of having useful business-oriented behavior, classes that otherwise would define suitable behaviors are diminished to just carrying around values from one location (usually an RDBMS) to another location, a piece of paper. See AttributeShufflingReduction.
  1. There is a certain implication that the reports, once defined and written, never change and will always be what is necessary. Stories of reports that continue for years being generated and sent to people who never use them or even know why they are getting them are not uncommon.
  1. The typical reporting run, being a batch-and-queue process, can wreak havoc with the functioning an interactive system while the reporting is occurring. Reports against databases often do expensive table scans, killing performance for users doing transactional work. They exercise the object model in ways that are different enough from live transactional usage, and the tradeoffs between what's right for reporting and what's right for interactive are difficult to resolve. Mitigated by having a DataWarehouse to do OnLineAnalyticalProcessing (in other words, reports). The separation will avoid wreaking havoc to a OnLineTransactionalProcessing system.
  1. The request for a report is often a sign of an implicit idea that the computer can only do "data processing", and that the real analysis can only be done by hand. While that is still true for many classes of problems, the ability of programmers and systems to simulate and analyze is well advanced.
  1. Reports are often artifacts of times before the user interface hardware and software had reasonable formatting capabilities. Low-resolution screens and the limited ability of a screen print to a low-quality local printer, for example, compared to the capabilities of generating formatted output to a centralized quality line printer. With modern user interfaces and hardware, the information is often presentable on the visual, and if not, printing from the application in suitably formatted and high-quality manner is quite easy.
  1. Typical reporting tools connect to the underlying data in a different way than the core application and thus introduce duplication. Every change to the data structure to support application changes has the potential to require changes in the reports. In one case involving deploying a well-known reporting tool using SQL to query a relational database used by the application, when the underlying table structures and column definitions changed, the result was a cascading series of required changes to the reports because they had their own SQL.
  1. Report tools compromise testability and maintainability for marketable features such as "easy to use", "no coding required", and so forth. Frequently results in AutogeneratedStovepipeAntiPattern.
  1. It may be the case that behavior that would be useful in the application is relegated to the reports system because of an assumption that it would be a performance or resource intensive to put it in the normal interface. Remember, optimization is not what you think it is. If it's highly valued by the customer, and after at least a spike solution demonstration performance concerns, the development team should allow the customer to decide if additional time spent attempting to modify the implementation or solution to get acceptable performance is worth it or to drop the requirement and relegate it to the reporting framework.

I think specific scenarios should be presented rather than summarily saying that "reports are bad". Let's look at the alternatives first before we toss them. Also, the above seems to take a pro-OO/anti-RDB stance, for good or bad.

Agreed. We also need to see specific scenarios in user requirements where they creation of a report really is what the user wants and satisfies the requirements. As noted above, the ability for some user-level system visibility to be satisified by a report is too often assumed. Better requirements and understanding of our customers are essential. Personally, I see way too frequently teams take the easy way out and generate dense and ill-conceived reports in response to a user request for a specific need. --StevenNewton

What if the only way you could get a word/line/page count and spellcheck on your Word document was generate a report that was disconnected from the actual editing functions? If you had to manually compare the report window results with the document and make changes, then re-run the report to see the results? That's the kind of thing that happens in applications all the time.

Real-time info is not always practical to provide, especially if it sums and compares a lot of different things. For example, I have seen word-processors that can provide a lot of stats about a document such as number of paragraphs, average words per paragraphs, average letters per word, average sentence length, and even estimated grade-level of writing using a specific algorithm. It is not realistic to espect these to be re-calculated on every keystroke just so that they are there. Also, I am not sure if you are talking printed report or screen-based report.

Can we see some measurements to demonstrate that on-the-fly maintenance of these statistics is a performance problem? Otherwise, beware PrematureOptimization.

Either you calculate the entire document for each keystroke, which is a performance risk at the least, or you use some kind of delta-based scheme. Clearly the delta-based approach is going to be more coding and potentially more buggy since things may get out of sync for conditions we fail to anticipate. I estimate a good delta approach would take 10 times more coding/debugging. I tried to do something similar for accounting data-entry once, and it turned out to be more code and more rules than I anticipated. And, as a user, I usually don't need such a total document stat feature for every single keystroke. Thus, the cost-to-benifit ratio is high (DecisionMathAndYagni). Sometimes MS-Word gets hurky-jerky if there are a lot of other apps or high-priority tasks running in the back-ground. It may pause for a few seconds to re-calc the page numbers when I enter or change the boundaries. I would rather it not do that at times. It is bad enough people interrupt my train of thought while writing, I don't need the same from a PC. Anyhow, I would like to get away from word-processor scenarios and back to biz examples if possible, please. --top

Users don't need a per-keystroke response time. This sort of interactive report can be handled quite effectively by simply recalculating it a few seconds after the user has stopped doing anything. There are always compromises that can be made.

Such info is still a "report". Being on the status bar doesn't change its general nature. Also, it may make more sense to have some basic or common stats be almost-real-time, but fancier stats be accessible via a button where you would otherwise put such stats. Example:

Total characters........: 2403
Total words.............: 480
Total pages.............: 3
Ave. charctrs. per word.: 5.01
Total paragraphs........: 12
Ave. words per paragraph: 32.48
Spencer/McGee gradelevel: 6.42 // grade-level estimation algorithm
Etc.....

This gives one more room for more stats.


Regarding Live Reports: the delta-based approach is going to be more coding and potentially more buggy since things may get out of sync for conditions we fail to anticipate -- top

There are programming models that make developing incremental algorithms much easier.

If you know it's a requirement, you can certainly plan for it emphasizing concepts such as GateKeeper. But whether the total result is as simple as without it, I'm skeptical without a coded demonstration. It's another cross-cutting feature dimension to plan, code, include, and test for. Rarely are added cross-cutting features free. I especially expect relationship-oriented info to be added complexity, such as the grade-level metric in the sample above. Sure, it may be possible to rework the algorithm to be incremental, but I doubt it's trivial for most cases. If you borrow or buy an existing algorithm, most likely it's not written to be increment-friendly because that's not the common way software is currently written and taught on this planet. Whether it "should be" or not is a different scope of "fix". I'm assuming we are focusing on handling specific projects as encountered and not on overhauling the profession. -t

I'm thinking more along the lines of Lucid or Bloom - i.e. programming models where incremental computation is the norm. You are right that cross-cutting features aren't free, but that cost doesn't need to be complexity. Incremental programming models will tend to constrain how side-effects are expressed or state is maintained. On the other hand, most relevant concerns are incremental - e.g. human perception, human communication, human relationships, physics. Expressing incremental algorithms for most metrics should not be too onerous.


Most computation I deal with is about gathering and formatting live data (effectively, distributed queries or reports), and using it to make real-time decisions. Granted, not all formatting I perform is for human consumption. But the basic pattern remains.


Reports are the facial expressions of data. They may be needed to tell what's going on inside a complex system.


Reports provide data with a very low amount of coupling to the system holding the data. The user of the report does not need access to the system providing the report. The user does not need an access account nor a network connection to the system. It requires a great deal of support to set up and maintain rarely used accounts as opposed to generating periodic reports. If the data needs to cross department, agency, or corporate boundaries, then the maintenance and support required increases further.

GUI displays are nice, but they are not always the best solution to the problem.


How about we focus on the misuse of reports rather than the assumption that all reports are bad. In other words, patterns of improper or overuse of reports.

''I think that topic would need another title. Perhaps 'ReportSmells' or 'MisuseOfReports'.


OctoberZeroEight

CategoryBusinessDomain