Reports Smell

ReportingFrameworks

If your application consists primarily of reports, this is a smell because:

It's an indicator of incomplete or vague user needs. Essentially, a request for a report means "Gather a bunch of data and show it to me nicely formatted". Fine, but what exactly do all the pretty rows and columns tell you? Discover the true "why" for the report, and the need for the report evaporates.

Usually when I discover the true "why" for a report, the need for the report is now supported by a good reason. It's when I can't find a true "why" for the report that the need for the report evaporates. I have no doubt that there are plenty of useless reports out there, but I highly appreciate (for example) that my car reports gasoline, speed, RPM, etc.

Reporting tools often cast domain objects in the role of dumb data containers. Instead of having useful business-oriented behavior, classes that otherwise would define suitable behaviors are diminished to just carrying around values from one location (usually an RDBMS) to another location, a piece of paper. See AttributeShufflingReduction.

If you have 'domain objects' you're already doing something wrongly. Unless, of course, your intention is to have a business or domain simulator. ObjectOrientedProgramming is effectively applied only for computation objects - FirstClass modules and services as opposed to domain objects. -AnonymousDonor {Possibly related - OopBizDomainGap, ComputationalAbstractionTechniques}

I'm skeptical that the solution is to move away from attribute-orientation and toward behavior-orientation. Behaviorism tends to bind one to a specific computer language, and attribute-based CRUD systems is still an under-explored and/or under-used technique. Many common CRUD idioms that some wrap as behavior can be attribute-driven if behaviorites simply abandon their pro-behavior bias. (I hope this doesn't turn into yet another heavy-typing debating.) --top

Further, reports are a user-side tool, not a developer-side tool. They don't care whether the engine under the hood is OOP or gerbils, as long as their output is what they want and need. Changing the design paradigm is not really an issue to them. --top
- Actually, we really don't want gerbils under the hood. There are certain side effects of gerbil-oriented computations that we programmers don't wish to clean up or garbage collect.
  - This is not a topic about error prevention.
    - Even so, gerbil-oriented accidents would put a whole new meaning to "ReportsSmell".

There is a certain implication that the reports, once defined and written, never change and will always be what is necessary. Stories of reports that continue for years being generated and sent to people who never use them or even know why they are getting them are not uncommon.

This is not a case for ReportsSmell in general. It does indicate that 'static policies' aren't all that magnificent, and helps make a case for greater recipient control over reports in an ad-hoc manner.

The typical reporting run, being a batch-and-queue process, can wreak havoc with the functioning an interactive system while the reporting is occurring. Reports against databases often do expensive table scans, killing performance for users doing transactional work. They exercise the object model in ways that are different enough from live transactional usage, and the tradeoffs between what's right for reporting and what's right for interactive are difficult to resolve. Mitigated by having a DataWarehouse to do OnLineAnalyticalProcessing (in other words, reports). The separation will avoid wreaking havoc to a OnLineTransactionalProcessing system.

Concurrency, even of reports and reflection and datamining that touch many tables, should be considered at design time of data management system. Viable technological solutions exist, such as non-locking optimistic transactions, or via versioning with snapshots. Calling this a ReportsSmell is misplacing the blame. Lock-based transactional systems smell.
[Locking isn't always the source of the problem. One system I worked on, the reports were getting so much of the system's resources that the transaction processing ground to a halt.]
I agree; locking isn't always the problem. But resource priority policies and such aren't really the venue of a DBMS.
[It is when the only access to those resources is via the DBMS. Since the processes were, in and of themselves, fairly lightweight, the DBMS had been given the vast majority of the system's resources. But the priorities that it was using to allocate those resources to the requests it was receiving was leaving too few to process the transactions.]
I literally meant the policies, as in how things should be prioritized. Someone outside the DBMS should be the one who decides whether the transactions choke the reports or vice versa. Implementation of policies is a job for any multi-tasking system, including the DBMS. As far as there being 'too few resources', that again comes down to policy issues outside the DBMS because someone needs to decide which transactions are delayed or dropped in order to make room for other transactions.
It's not always that simple. Often only a DBA can know the resource requirements, and often only after the queries are written (and maybe tunable). I find that department-specific mini data-warehouses are often useful for improving bottlenecks by offloading tasks or query idioms common to a given department via nightly or weekly copying and reformatting. However, it takes resources to manage view replication/generation.
By no means did I intend to imply that policy issues are "simple". To the contrary, policy issues for resource consumption, authority, permissions, priority, exclusive access, distribution, mobility, efficiency, remote access, safety or consistency, etc. are usually CrossCuttingConcerns that interweave in some rather complex manners. This lead to my original statement: Concurrency, even of reports and reflection and datamining that touch many tables, should be considered at design time of data management system. Concurrency is, of course, one of many things that must be considered at design time. Supporting policy issues can't usually be patched in later, at least without an effort of a magnitude nearly equal to that of a complete rewrite.

The request for a report is often a sign of an implicit idea that the computer can only do "data processing", and that the real analysis can only be done by hand. While that is still true for many classes of problems, the ability of programmers and systems to simulate and analyze is well advanced.

The basic purpose of a report is to help with decision making by the recipient. Automated analysis and summaries using rules and heuristics that embed domain knowledge or models should not be excluded as possibilities for automated report generation. It is technologically possible to automate the decision making process as well, but often policy issues interfere, such as responsibilities that can only be accepted by certain people for legal reasons.
- It would still be possible to compile "executive summary" and leave the actual decision to someone competent: it's no different from hiring consultants to give advice, except you don't have to pay the reporting system.

Reports are often artifacts of times before the user interface hardware and software had reasonable formatting capabilities. Low-resolution screens and the limited ability of a screen print to a low-quality local printer, for example, compared to the capabilities of generating formatted output to a centralized quality line printer. With modern user interfaces and hardware, the information is often presentable on the visual, and if not, printing from the application in suitably formatted and high-quality manner is quite easy.

I'm not sure if this is pitting batch reports against interactive reports. If so, the scope needs to be clarified. Note that one problem with interactive web-based reports is that they often don't fit well on the printed pages. Column headers and titles are not "paper boundary aware, leaving ugly chops and missing context". It's a problem that I face often with no easy solution yet. Some use PDF report generators, but these are not as "natural" as HTML. Sometimes two different formats are needed (paper and screen), and this is time-wasting duplication. --top
Besides, having more pixels will never eliminate the need for good summaries.

Typical reporting tools connect to the underlying data in a different way than the core application and thus introduce duplication. Every change to the data structure to support application changes has the potential to require changes in the reports. In one case involving deploying a well-known reporting tool using SQL to query a relational database used by the application, when the underlying table structures and column definitions changed, the result was a cascading series of required changes to the reports because they had their own SQL.

Of course, changes to the DataModel will require cascading changes to all applications and services that access it. But this strong dichotomy between report applications and 'core' applications seems unjustified.
- But those changes to the core applications are likely to be contained within existing bounds; reporting tools on the other hand tend to cut across model boundaries. A change to the DataModel, (likely to be) contained to a bounded area of the core application would, in contrast, affect a broad swath of previously independent parts of the reporting system. It will still be a cascade, but it wouldn't be in the changes required in the reporting system, it would be implicit in the logical inferences that would need to be thought through in order to find the erstwhile independent parts of the reporting system that need changing.

Report tools compromise testability and maintainability for marketable features such as "easy to use", "no coding required", and so forth. Frequently results in AutogeneratedStovepipeAntiPattern.

There are engineering tradeoffs to make when designing any tool. If you think you can do better, you may of course create a ReportDefinitionLanguage or report generation tool or a DataModel that makes a different set of tradeoffs. You might even do better in every case because you can learn from the mistakes of the existing systems.

It may be the case that behavior that would be useful in the application is relegated to the reports system because of an assumption that it would be a performance or resource intensive to put it in the normal interface. Remember, optimization is not what you think it is. If it's highly valued by the customer, and after at least a spike solution demonstration performance concerns, the development team should allow the customer to decide if additional time spent attempting to modify the implementation or solution to get acceptable performance is worth it or to drop the requirement and relegate it to the reporting framework.

The topic statement of ReportsSmell already discusses applications that consist mostly of reports. In such cases, it wouldn't be unimaginable for a reports system or library or framework to be part of the application.

I think specific scenarios should be presented rather than summarily saying that "reports are bad". Let's look at the alternatives first before we toss them. Also, the above seems to take a pro-OO/anti-RDB stance, for good or bad.

Agreed. We also need to see specific scenarios in user requirements where they creation of a report really is what the user wants and satisfies the requirements. As noted above, the ability for some user-level system visibility to be satisified by a report is too often assumed. Better requirements and understanding of our customers are essential. Personally, I see way too frequently teams take the easy way out and generate dense and ill-conceived reports in response to a user request for a specific need. --StevenNewton

What if the only way you could get a word/line/page count and spellcheck on your Word document was generate a report that was disconnected from the actual editing functions? If you had to manually compare the report window results with the document and make changes, then re-run the report to see the results? That's the kind of thing that happens in applications all the time.

Real-time info is not always practical to provide, especially if it sums and compares a lot of different things. For example, I have seen word-processors that can provide a lot of stats about a document such as number of paragraphs, average words per paragraphs, average letters per word, average sentence length, and even estimated grade-level of writing using a specific algorithm. It is not realistic to espect these to be re-calculated on every keystroke just so that they are there. Also, I am not sure if you are talking printed report or screen-based report.

Can we see some measurements to demonstrate that on-the-fly maintenance of these statistics is a performance problem? Otherwise, beware PrematureOptimization.

Either you calculate the entire document for each keystroke, which is a performance risk at the least, or you use some kind of delta-based scheme. Clearly the delta-based approach is going to be more coding and potentially more buggy since things may get out of sync for conditions we fail to anticipate. I estimate a good delta approach would take 10 times more coding/debugging. I tried to do something similar for accounting data-entry once, and it turned out to be more code and more rules than I anticipated. And, as a user, I usually don't need such a total document stat feature for every single keystroke. Thus, the cost-to-benifit ratio is high (DecisionMathAndYagni). Sometimes MS-Word gets hurky-jerky if there are a lot of other apps or high-priority tasks running in the back-ground. It may pause for a few seconds to re-calc the page numbers when I enter or change the boundaries. I would rather it not do that at times. It is bad enough people interrupt my train of thought while writing, I don't need the same from a PC. Anyhow, I would like to get away from word-processor scenarios and back to biz examples if possible, please. --top

Users don't need a per-keystroke response time. This sort of interactive report can be handled quite effectively by simply recalculating it a few seconds after the user has stopped doing anything. There are always compromises that can be made.

Such info is still a "report". Being on the status bar doesn't change its general nature. Also, it may make more sense to have some basic or common stats be almost-real-time, but fancier stats be accessible via a button where you would otherwise put such stats. Example:


        Total characters........: 2403

        Total words.............: 480

        Total pages.............: 3

        Ave. charctrs. per word.: 5.01

        Total paragraphs........: 12

        Ave. words per paragraph: 32.48

        Spencer/McGee gradelevel: 6.42   // grade-level estimation algorithm

        Etc.....

This gives one more room for more stats.

Regarding Live Reports: the delta-based approach is going to be more coding and potentially more buggy since things may get out of sync for conditions we fail to anticipate -- top

There are programming models that make developing incremental algorithms much easier.

If you know it's a requirement, you can certainly plan for it emphasizing concepts such as GateKeeper. But whether the total result is as simple as without it, I'm skeptical without a coded demonstration. It's another cross-cutting feature dimension to plan, code, include, and test for. Rarely are added cross-cutting features free. I especially expect relationship-oriented info to be added complexity, such as the grade-level metric in the sample above. Sure, it may be possible to rework the algorithm to be incremental, but I doubt it's trivial for most cases. If you borrow or buy an existing algorithm, most likely it's not written to be increment-friendly because that's not the common way software is currently written and taught on this planet. Whether it "should be" or not is a different scope of "fix". I'm assuming we are focusing on handling specific projects as encountered and not on overhauling the profession. -t

I'm thinking more along the lines of Lucid or Bloom - i.e. programming models where incremental computation is the norm. You are right that cross-cutting features aren't free, but that cost doesn't need to be complexity. Incremental programming models will tend to constrain how side-effects are expressed or state is maintained. On the other hand, most relevant concerns are incremental - e.g. human perception, human communication, human relationships, physics. Expressing incremental algorithms for most metrics should not be too onerous.

Most computation I deal with is about gathering and formatting live data (effectively, distributed queries or reports), and using it to make real-time decisions. Granted, not all formatting I perform is for human consumption. But the basic pattern remains.

Reports are the facial expressions of data. They may be needed to tell what's going on inside a complex system.

Reports provide data with a very low amount of coupling to the system holding the data. The user of the report does not need access to the system providing the report. The user does not need an access account nor a network connection to the system. It requires a great deal of support to set up and maintain rarely used accounts as opposed to generating periodic reports. If the data needs to cross department, agency, or corporate boundaries, then the maintenance and support required increases further.

GUI displays are nice, but they are not always the best solution to the problem.

How about we focus on the misuse of reports rather than the assumption that all reports are bad. In other words, patterns of improper or overuse of reports.

''I think that topic would need another title. Perhaps 'ReportSmells' or 'MisuseOfReports'.

OctoberZeroEight

CategoryBusinessDomain