Node Js And Hof Discussion

NodeJs (a JavaScript kit) and HOF Discussion

An absolute demonstration of eval()'s insufficiency and the power of closures is NodeJs, a platform for server-side JavaScript. The genius of Node is that every single I/O call is non-blocking; rather than returning its result directly, it takes a callback function, reminiscent of ContinuationPassingStyle. For example, reading from a file looks like this:


        fs.readFile("path/to/file.ext", function(err, data) {

        // do stuff with file data in here

        });

Because every call is non-blocking, like the above, Node essentially has concurrency ForFree. It's a fantastic way to write all sorts of applications, especially servers. And it's absolutely 100% impossible for any eval() system to achieve this kind of evented structure (at least, without actually being a closure system in disguise).

To show exactly why eval() can't possibly do this, here's a code example from a simple Node server:


        http.createServer(function(req, res) {

        res.writeHead(200, {'Content-Type': 'text/html'});

        withImage(function(img) {

        res.end("");

        });

        }).listen(8000);

Notice that the bolded variable is accessed in both the outer and inner scopes. Without LexicalScoping and proper closures, that variable would be gone by the time withImage() fetched the image and called the inner callback. With those two things, this code works perfectly and is quite natural.

-DavidMcLean

I almost never had a real problem with concurrency for web apps or client/server architecture. If it becomes an issue, then I rewrite it to use the concurrency-handling capability of the database, such as ACID and transactions. You appear to be using a file system for data "storage" when a database would perhaps be more appropriate. If you need to store content such as photos with concurrent uploads, then generate a unique ID first in the image or article tracker or record (usually article ID), and use that as part of the destination file name to avoid user collisions. (It's also possible to store images directly in a database, but that's not always an option. There are other approaches, but I'd have to see the the domain details to recommend something specific.) It's common for us CBA developers to rely on the concurrency engine in the RDBMS for concurrency needs. Why do you HOF fans keep trying to sell refrigerators to us Eskimos? -t

Ah, figures. The fact that I mentioned files at all clearly means I advocate them over databases for all data storage. I merely selected the file-read function to demonstrate in a simple way the callback-oriented I/O; in retrospect, this was a very bad move considering to whom I'm talking.

Node's non-blocking I/O doesn't just apply to files. It works equally well with any kind of I/O, including databases, and it does so using the exact same callback design. That withImage() function actually fetches an image URL from the Google Images API, for example, not from anything on the local filesystem (unless your local filesystem just happens to be a Google datacentre).

Actually, given what I've seen of TableOrientedProgramming, I think Node would be an excellent fit for it. TOP clearly is full of database queries. You know how whenever you make a database query there's network overhead? When you use Node, that network overhead is put to good use. While it's waiting on the results of a query, your app is free to receive other requests, perform other computations, and so on. Node's weak point is stuff that requires a lot of processing directly in-app, rather than through I/O, since its magical concurrency design doesn't inherently include threads. Because TOP involves having the database do as much processing as possible, Node's weakness is dodged nicely.

-DavidMcLean

I'm not sure what you mean by "network overhead" here.

Re: "While it's waiting on the results of a query, your app is free to [do other things]". That hasn't been a common (explicit) need for the web-apps I've worked on. But it can possibly be handled with "hidden" frames or iframes. Web pages make it pretty easy for the user to "spawn" other pages if they want while waiting. One can put up a page that says, "While waiting for your request to process, here are some other articles/topics you can peruse". In-line ads already do this pretty much. Just make sure to set "target='_blank'" so the original "waiting" page is not wiped out.

On client-server apps, one typically makes the long process spawn a secondary process and "status screen" that says something like, "Your query/report is being processed, please wait...". When it's ready, then a notice and a "View Report" button comes up on that secondary screen.

You send a request to your database. There is some delay, due to the time taken to deliver the request and response across the network. You receive a result from your database. Network overhead. (Also, there's overhead in the DBMS itself, especially as queries grow more complex. There's plenty of overhead in the whole process for Node to exploit.) Under most platforms, that network overhead slows down your app, as synchronous code has to block while it waits for results; under Node, overhead gives your app the chance to get some work done, processing events from the queue.

You say there hasn't been an explicit need for concurrent behaviour in your Web apps, but in fact all Web apps need to be concurrent, because they will inevitably receive requests from more than one user. The "explicit" part, I think, highlights that you've missed the point a little: Why should you have to be explicit? It's a Web app; of course you're going to want decent multiuser handling. Node's evented structure means that you get multiuser concurrency in your app server implicitly, without needing to give it special consideration.

The database and related API's handle that. That's ACID and RDBMS Transaction Handling 101. It grew of age in the client/server era and has mostly lasted into web development and has served us relatively well. If a "transaction" (set of related queries) cannot complete itself due to multi-user contention, such as two people editing the same row at the same time, then the app can "catch" the transaction failure and handle it various ways ranging from "quick and dirty" to "fancy-lots-of-options", usually depending on how often collisions typically occur. Many forms of "roll-back" are even handled by the RDBMS, depending on the how queries are set up. See TwoPhaseCommit for some examples. (Specific RDBMS often offer even more options in their proprietary API's to entice orgs away from generic standards, such as ODBC/JDBC.) We should be careful about tossing something that's stood the test of time for two decades for typical apps. There seems to be a fad to reinvent all that (poorly) in the app code of late. -t

You're presenting a FalseDichotomy. Using asychronous callbacks doesn't require completely abandoning transactions; that would be idiotic, especially considering that the two things are totally orthogonal. A sophisticated Node application that interacts with a database will use both asynchronous I/O, improving its own performance, and transactions, ensuring data integrity. The async I/O design is another tool in the toolbox, which improves the response time of your application itself. It's not a GoldenHammer that replaces every single other tool you've ever used. Because, like you've said, tossing that stuff is dumb. -DavidMcLean

How exactly does it improve performance noticeably? Take us through a specific scenario. Keep in mind that code creation and maintenance is generally the cost bottleneck for custom software, not performance, at least not at the incremental level. But still I'd like to see a semi-realistic example. Did I mention that I'd like to see a semi-realistic scenario? And I'm not sure loading up the tool box is a free lunch. ParadigmPotpourriMeansDiminishingReturns.

While I cannot condone using frames to implement asynchronous Web-apps (you do know what AJAX is, right?), the client-server technique you describe is a sane one. However, the idea of offloading complex processing to a separate process is the crux of Node's nimble event-loop, so that design automagically works in Node without much special server setup! The platform makes it extremely natural and clean to set up many concurrent situations, including the one you describe.

-DavidMcLean

Maybe if we explore some kind of semi-realistic scenario. The devil's in the details of the domain needs. -t

You'd like a specific scenario? Okay. Here's a simple example:


        var db   = require('db');

        var http = require('http');


        var s = http.createServer(function(req, res) {

        res.writeHead(200, {"Content-Type": "text/html"});

        db.query("SELECT * FROM tweets", function (err, rows) {

        for(var i = 0; i < rows.length; i++) {

        var tweet = rows[i];

        res.write("" + tweet.text + "
");

        }

        res.end();

        });

        });

        s.listen(8080);

The above code, based somewhat on an example program from the book Node Up And Running, implements part of a Twitter clone in Node; the design is fairly conservative so as not to distract from the following explanation of the eventing process (a production version would almost certainly use some HTML templating system, rather than generate HTML directly inside the query callback, as this example does, for instance). This basic code concept would be perfectly applicable to report-writing business applications, among others.

Let's look at how the above code is processed, under Node. When the application starts, practically all of the code shown above is run almost immediately. This "first pass" in Node usually does very little actual processing; in this application, the first pass sets up some callbacks and tells Node to listen on port 8080. After the first pass completes, in any Node application, Node will stop and wait for events to show up in its event queue.

Now our server is up and ready to use. Let's have client A make an HTTP request.

Client A makes an HTTP request to the Node server.
Client A's request becomes an event, event A, which is pushed onto Node's event queue.
Because right now Node isn't doing anything else, it can immediately process event A from the top of the queue.
The callback we passed into http.createServer() functions as an event listener. Let's call it callback F. Event A triggers callback F, passing into it information on the HTTP request (req) and an object used to form an HTTP response (res).
The code inside callback F executes, building up an HTTP response for client A.
We reach the call to db.query(). This happens to be a non-blocking call, so Node sends the query to the DB, and without waiting for any response it goes on with the rest of the current function. Nothing inside the db.query() callback (callback G) is executed yet.
There's no more code in the current function, so Node has nothing else to run. It goes back to event-listening.

Now, one of two things can happen next:

Client B makes a request. Node goes through all of the above steps for this new client, too: The request becomes event B, event B is processed, and a database query is made.
The result from event A's database query arrives. In that case, the following happens:
- The query result is received by Node, and it becomes an event. Let's call it event C. Event C is pushed into the event queue.
- When event C reaches the top of the queue, Node can process it. To process this event, Node needs callback G from db.query(): The results of the database query are passed into callback G.
- Callback G's code is executed. It builds the rest of the HTTP response for client A, concluding with res.end().
- Client A's HTTP request is complete.

That about covers the Node eventing model. The most important point is that the only time Node isn't doing something is when there is nothing to do, when there are no events waiting in the queue to be processed. Node never stops to wait for a database query, or file read, or call out to a shell script, or any of those things. It's essentially always getting something done.

It should be apparent from the above example why Node has such good concurrent performance, as well as why it's weakened by applications needing significant processing directly within Node rather than through external processes.

-DavidMcLean

No no no no. I'm not looking for a technical-only demonstration. I want to see it solve a stated semi-realistic business need. Something along the lines of, "I'm a warehousing company and need a report that shows the both the dfasdf and the asdfasdf every third Tuesday of the month because that's when our fasdfjk deliveries roll in. Here is how to best satisfy this need..." Your demo is more along the lines of "I can print 'Hello World' 0.02 seconds faster!" Which may be neat in a MentalMasturbation kind of way, but is not shown solving a real need out in the field.

You asked "How exactly does it improve performance noticeably?", so I assumed you wanted an explanation of how exactly it improves performance noticeably. Regardless, the example code is easily applicable to business applications, as I already mentioned; it's essentially a (very simple) report writer, much like Challenge #6. (I chose this specifically with the expectation that you would consider report writers business apps, so I'm a little surprised at your response.)

It seems odd to complain that my demonstration shows a way to produce results faster, when I'm demonstrating a concurrency system. Replacing your strawman description with something actually describing what I've just shown, such as "I can print your business's reports seconds-to-minutes faster", should show that what I've provided is actually a highly desirable quality in a business app.

-DavidMcLean

Sorry, I don't see what bottleneck it's allegedly plugging. What exactly is serial before and now parallel under your gizmo? The delays in most custom web apps are NOT caused by slow server app processing, but the delay of bytes across the network "wires", and secondly the database processing. Now databases can and do use parallelism, but you have to have the hardware and indexes etc. set up properly, which will probably be an issue in any shared, multi-user data "storage", and databases are probably more mature in that department. Most of the same kinds of decisions and trade-offs will be relevant to any thing that fulfills that role. Parallelism and concurrency is often not a free lunch: it can require more discipline and planning. Let's not spend our time coding/tweaking around with such if it's not necessary because single-threading and serial processing are usually easier to debug and grok. Thus 'one shouldn't parallelize willy nilly.'

Further, most web servers already do parallel processing because each user's request is or can be a separate process. 'Splitting further at the sub-user level doesn't gain one anything. Say there are 16 current users and 8 processors. The web server (IIS or Apache etc.) will typically split these 16 users among the 8 processors such that two users (or requests) are assigned per processor. If you split at the sub-user level also, then you simply have more little processes waiting in line. 10 2-inch processes waiting in line is not going to be better than 5 4-inch processes waiting (especially since they can be allocated to different CPU's when ready). Now when typical servers have say 100 processors, it may start to be a help, but the app code (outside of queries) is generally not doing anything processor intensive anyhow such that 90 of those will be doing NO-OP's (or working on database requests). We're not calculating pi to 500 million decimal places or predicting the weather 2 weeks ahead. If your app code (outside of queries) is the performance bottleneck, then usually you are doing something wrong, or at least the hard way. Typically the sin is not taking advantage of the database's capabilities and instead doing mass DatabaseVerbs in app code. CBA App code should mostly be like a building receptionist: guiding inputs, outputs, and service requests to their proper destination based on the business rules. The receptionists should not be processing mass piles of paper or forms. If they are doing those kinds of tasks, then you are misusing them.

Your characterisation of business-app code as a building receptionist is a good one. It's bizarre that you present it as a case against evented-I/O concurrency, though, because that's exactly what evented concurrency systems such as Node.js do well, and we've been explaining the system in those terms. The only way I can see you'd object to Node by describing it as "receptionists… processing mass piles of paper or forms" is if you haven't understood something we've presented. If so, please specify precisely what's confusing you, so we may expound on it in more detail. -DavidMcLean

The database and web-server handles most of the concurrency & parallelism. I'm not saying concurrency & parallelism is not important, I am just saying that in CBA's one usually does not have to make the APPLICATION code deal with it. This keeps the software simpler, as parallelism can be hard to grok and debug; most developers naturally think sequentially. One is leveraging the existing concurrency features of the web server and the database. -t

[Conventional data-in/report-out custom business applications don't have to deal with it, but business users increasingly expect mobile and desktop application behaviour in their Web-based applications. That requires high degree of concurrency and responsiveness, typically greater than that which can be easily done (or, in some cases, is even possible) with conventional database and web-server approaches. The user doesn't expect a formatted list of figures or a downloaded spreadsheet any more; that's not good enough. The user expects to be able to twiddle dials on a dashboard to change the graph in Report A in real-time, whilst summary Report B generates in the background with an accurate and continuously-updated "percent complete" bar, whilst Report C and Report D display on-screen live up-to-the-second graphs of sales and order volumes. You can probably do it with existing database and web-server approaches (and a goodly barrel of client-side Javascript) but evented I/O makes it easier.]

A similar issue comes with queries per unit of time. Sub-user parallelism may be able to submit more queries per second, but if all the other users are doing also submitting queries, then we are right back to the same kind of problem. From the database side, the number of queries it can process per second (from multiple users) is going to be a far far bigger factor than how many queries we can submit to the database (queue) per second. It doesn't address the actual/typical bottlenecks. It would be roughly comparable to bridge toll booths. We can make the highway going up the the booths wider (more lanes, say 12), but if we only have 6 booths actively processing (the database), then the same number of cars are still coming out the other end per minute (after they pay their tolls). We are not noticeably reducing a person's trip time. The wider feeder highway is mostly a waste of space and tax money. If the traffic is light enough not to bottle up the toll booths, then most likely the 12 feeder lanes will be sparse and wasted such that they are not helping the lighter-traffic trip times either.

It's rarely economical to spend your resources/time on the non-bottlenecks.

[Top, database processing is not the bottleneck per se -- even low-end DBMSs on low-end hardware can handle dozens of simultaneous queries. The bottleneck is within applications designed to work serially: They wind up waiting for apparently unrelated parts of themselves. This is exemplified by having to wait for content A to load before you can work on unrelated content B, where users are typically forced to stare at some irritating spinning "please wait" icon or irrelevant substitute content. It's because all the application's processing is serialised, thus forcing the user to perceive the cumulative effect of various inevitable small delays. Concurrent event-driven models essentially eliminate these.]

I have no idea what you are referring to. If you mean commercial websites, it's usually the browser that goes out and gets the various resources, some of which arrive slower than others. If I request "graphic 7" from server X, and server X is slow to respond, there's nothing I can do at the client side nor the "main" web server side to speed that up. (Larger sites typically host content from multiple servers.) The bottleneck is beyond the "main" developer's control. So is the nature of the Web. If I remember correctly, browsers typically launch roughly 4 sub-processes to independently retrieve different portions of requested web pages. One can increase this number via the browser settings, but there are trade-offs. Now there are plenty of poorly-designed sites in which the designer/developer made other portions depend on a specific thing, but this is just stupidity, not lack of HOF's. Note that in some cases it's purposely done to prevent ad-blocker or pop-up-blocker software. It does this by making sure the client fully loaded an ad before it allows the browser to load the rest. The JavaScript may sample the contents of the ad at it's ID and unless the read content matches the content template (image sample), it keeps looping, holding up the rest of the page. Thus, if one is running ad-blocker-ware, they can't see most of the page. This problem is caused by the profit motive, not lack of HOF's. It's been intentionally serialized. If you want to argue that HOF's are better at making greed-ware, be my guest.

[It appears you've misunderstood my point, but I believe the discussion below clarifies it, so I won't add anything here.]

[What's notable about using concurrent event-driven models with proper support for higher-order functions -- as with AJAX and appropriate client-side libraries, Node.js, Windows 8 modern-style apps, and so on -- is that massive concurrency is essentially free. Once you grasp the underlying approach, it is no more difficult to develop and debug them than conventional "serial" applications, but the improvement in responsiveness and fluidity represents not only a better user experience, but a potential competitive advantage. It doesn't matter if you still develop apps that permit the same overall throughput as your competitor (which, of course, might even be a fellow developer within your department.) If her apps are perceived to be fluid and responsive, but your apps are perceived to be clunky because they stall and/or show a tumbling hourglass, who's going to win?]

You must be doing something weird. In my typical biz web apps, the database is usually the bottleneck. If I comment out the database portions, the rest runs in a snap. I do this frequently when tuning page esthetics. Browsers already have built-in parallelism and speed short-cut mechanisms and one can leverage this. Your parallelism GoldPlating is a waste of programmer time. (There are other ways to speed up perceived and real web-page rendering without JS and HOF's, but we are wondering off topic here.)

[The fact that "if [you] comment out the database portions, the rest runs in a snap" is almost exactly why non-blocking IO gives such a huge boost to concurrent performance; critically, that the database portions are by far the slowest means that your task is "IO-bound" (as opposed to "CPU-bound", in which the slowest sections are calculations or other algorithms run in your app process). Freeing your app from blocking on IO-bound operations means that your app is "open" to perform other processing significantly more often, which in turn makes handling a large number of requests from one or many clients significantly easier. -DavidMcLean]

Yeah, who needs data? Fill the report cells with happy faces and pink ponies; data is so old school. Solutions!

[No, that's obviously stupid. The fact that commenting out the database access makes the rest of the application so fast doesn't imply that commenting out the database access is the solution. Rather, it implies that most of the application's time is spent in a busy-loop, simply waiting for the database access to complete, and that's CPU time which could far more effectively be put to better use. Because the non-database-access parts of the application are so comparatively speedy, the application genuinely doesn't have to do much on each request (aside from waiting around); with a non-blocking model, the app can do the little it has to as fast as possible and immediately get back out to the main eventloop. This implies that the app can far more easily do the small amount of stuff it has to do many many times in quick succession, such as if it receives requests from a large number of clients simultaneously. The data from the DB is still used, but it's efficiently served out to far more clients at a time. Naturally, concurrent handling of multiple requests can also be handled by spinning off a thread or worker process for each request, as Apache does, but both of those solutions typically involve significantly more memory and processor overhead than a single-threaded, non-blocking approach; in addition, threading brings up more complicated questions of race conditions, synchronisation, etc., whereas an asynchronous single-threaded model guarantees that all code sequences run atomically with a few very obvious exceptions (obvious both syntactically and semantically), which is a great deal easier to reason about. -DavidMcLean]

You don't seem to understand how typical web servers work. If the app (process) is waiting for the database, the CPU is not being wasted because it switches to processing other users' requests. It's not a "busy loop". A CPU (core) is not dedicated to a single user.

[You don't seem to understand how reading works, because I addressed that; as noted, the classical Web server model involves spinning up an individual thread or worker process for each request, which does indeed permit servicing multiple requests concurrently. Also as noted, however, this implies processor and memory overhead (especially memory overhead) that an asynchronous, single-threaded model does not. This is precisely the reason evented, single-threaded Web servers such as nginx and lighttpd are recognised as far more performant than thread-per-request Web servers such as Apache, especially when receiving many requests from many clients simultaneously. In addition, threading is a form of preemptive multitasking, which while valuable also greatly increases the likelihood of race conditions and other concurrency bugs that single-threaded async neatly avoids entirely by its cooperative-multitasking design. -DavidMcLean]

Exactly what does this have to do with the statement that started "The fact that "if [you] comment out the database portions..."? I've tried to get explicit specific scenarios out of you, but you just keep slipping back into Brochure Mode.

[All this ties into the original "comment out the database portions" statement through the first point I made, at the start of this subthread. To reiterate: As you point out, if you comment out the database portions, the rest runs extremely quickly. This means that the application process can get the non-DB stuff done extremely quickly, and hence manage to do it many many times in quick succession with minimal latency, which in turn means that the app can service a large number of requests at once (since each request takes up so little of the application's time). We have indeed drifted off of that point to a degree, due to the following: After my first point, you then went off on a tangent apparently presuming that I was advocating actually commenting out the database portions and ignoring the data in the DB entirely merely for a performance boost, which I addressed at length and with a complete recap of the relevant concepts. After that, you went off on another tangent by only responding to a small part of my previous response, namely the assertion that the traditional blocking model for Web servers leaves the CPU in a busy-loop while waiting on I/O; this concept, along with the fact that this is traditionally solved by using separate threads or worker processes, along with the downsides of such approaches, was in fact already discussed many times on the page before including in that very same paragraph you objected to. None of this is brochure talk. I've been speaking only of definite, technical properties of thread-per-request and non-blocking evented models for implementation of concurrent server applications. -DavidMcLean]

If I want off on an alleged tangent, it's because your writing is not clear and I answered a different issue than one you intended. And I've seen no evidence that the CPU is being significantly "wasted" waiting for a database result, if that's what you are getting at. When I look at CPU monitors, typically it pretty much flat-lines while waiting for the DB in single user tests. (Assuming the DB is on another server or core.) Thus, there is plenty of processor resources available for OTHER USERS if they are using the same server. If you claim there is CPU usage waste, please be clear on exactly where this waste is.

[Yes, exactly: The CPU monitor flat-lines while waiting for results. That's waste of CPU time. However, any application model claiming to support multiple requests ensures that the CPU sees actual use during that timeframe, whenever it can be put to use, including the traditional thread-per-request model. Thus, it's not so much CPU-time waste that arises from thread-per-request models, unlike a single-user approach; there is processor overhead even with a thread-per-request model, because spawning off new threads consumes some CPU, but by far the biggest overhead problem with thread-per-request servers is in memory. On a smaller scale, each thread's CPU time and other resources could be considered: A thread stuck in a busy-loop, waiting for results from some I/O-bound request, is wasting resources, since it's consuming (a small amount of) processor time and (a comparatively larger amount of) memory, even though it's doing absolutely nothing. A system under a non-blocking model, by contrast, isolates itself in a single thread and hence produces minimal overhead-per-request. -DavidMcLean]

You need to be more clear on what you are comparing to what. Clarity! Details! Specificness! Say it over and over. It's NOT a "waste of time" as you claim because there is no magic given by you to speed the DB up. HOF's don't have fairy dust. Remember, the CPU only flat-lines IF there are not other server users. I'm just describing it in a test of CPU usage under conditions that make it easier to check (and describe to readers), not as a normal circumstance. If there is only a single user on the server at the time, what "non-wasteful" things would a HOF-centric design be doing instead while waiting for the DB? Knitting a sweater for the grand-kids? As far as RAM usage, it may be possible to not load the DB-results-handling modules until after the DB results come back to save some RAM, and sometimes I've done that. However, that's rarely a significant issues these days because app code is small compared to data and drivers most of the time. Whatever alternative you are talking about, it's likely incremental improvements at best. I don't what to bicker over crumbs yet again. Come back when you have huge improvements to present.

[Whenever the CPU is doing nothing, there is a waste of CPU time. This is perfectly straightforward. Both threading and eventing allow you to free up the CPU for doing worthwhile things, instead of simply waiting. If there is a single request, there is nothing worthwhile to do, under both models, so CPU time is still wasted in those circumstances (unless one has tasks in the app that are not dependent on a request, such as a scheduled backup). Thus, there's trivially only a very minimal improvement on resource usage when there's only a single request; of course, a concurrency solution shouldn't be expected to improve non-concurrent scenarios by a significant margin. The important scenarios are those when there are many, many simultaneous requests, specifically in the thousands or tens of thousands. Each thread individually has only small overhead, but this accumulates very quickly when there are several thousand threads; meanwhile, evented servers simply "work harder" in their single main thread, which affords the ability to service a much larger number of concurrent requests due to the significant reduction in total overhead. -DavidMcLean]

"Whenever the CPU is doing nothing, there is a waste of CPU time." -- Agreed 100%, but so far you haven't shown a scenario where a traditional web server MUST waste CPU while NodeJS doesn't. In practice it's usually not worth it to tune traditional web servers for such mass concurrency because the database is usually the bottleneck in such situations, as I keep pointing out. Why bother to tune web servers for 1000 users or requests per minute if the database cannot even handle 100? Thus, your "complaint" may have technical merit but not practical merit. It may matter in say a mass chat-room app where the conversations are not kept and not searched, but that's a niche usage pattern and indeed NodeJS may shine for certain niches like that.

[A threaded server wastes much more in memory than in CPU, as previously mentioned, when there are many concurrent requests, although there's some processor overhead. As for the crux of your argument: I believe you're assuming that, since database access is trivially the slowest part of your application, the database is the slowest part of your application. Typically, it's not; the slowdown lies primarily in network overhead, rather than in the database management system itself. The average database management system can without difficulty cope with high numbers of requests, provided there's an application capable of sending them. At the extremes---for example, as your scenario approaches the ten-thousand simultaneous client problem---this may eventually demand replication and other database-parallelism techniques, but in general the DBMS can wrangle requests sufficiently well that there's no real bottleneck produced by sending it more simultaneous queries. DBMSes are heavily optimised, after all. With that in mind, affording the application server more efficiency in servicing requests can enable much higher concurrent user count, even without needing to turn to replication. -DavidMcLean]

In my experience, the DB side is usually the bottleneck. Yes, with a huge budget the DB side can be made to scale better, but that also applies to the web-server side. In most the projects I work on, though, the org does not want to spend that kind of money on the DB etc.. I've never worked for Google, Yahoo, etc. such that my experience and/or specialty may not be those big-co high-volume primary-public-interface applications/sites, but rather internal support or business-to-business business. "Features" and flexibility and low-cost is deemed more important than raw speed. Different niches/projects require different resource trade-offs and have different bottlenecks. You seem to be trying to solve a problem that I do not see as a significant problem. If staff has to be tossed or retrained for HOF-heavy app design to solve an issue that is not really an issue, it's considered a wasteful effort in the eyes of most business owners. Overhaul staff for REAL problems. -t

["If I comment out the database portions, the rest runs in a snap" proves nothing about the performance characteristics of the database itself. It hardly takes a huge budget to apply usual DBMS scaling techniques---replication setup is mechanical, for instance---but such things generally become necessary only at volume far higher than that a threaded server can typically process, so until concurrent access does approach extremes a greater boon to efficiency will be afforded through optimising the application than through optimising the database. As noted, evented concurrency is also a great deal easier to work with and comprehend than threaded concurrency, since by design it avoids the most common concurrency bugs, so although training may be necessary it isn't a particularly daunting issue and certainly will be more straightforward than similar training for working with threads. -DavidMcLean]

Web servers have "extra box" scaling options also if they become the bottleneck, which they typically don't because usually the DB is bottleneck, despite your repeated insistence otherwise. I will agree that typical web servers have less options, or more difficult-to-install options, for such compared to most RDBMS, but this is largely because there is insufficient practical need for reasons I already gave. They don't max out often enough. The industry solves actual problems, not made up problems like starry-eyed fan-boys do. I believe we've discussed this already; we are going around in circles. What's an example of a "common concurrency bug" that NodeJS prevents?

[First up, the problem of scaling Web servers is widely recognised ( http://en.wikipedia.org/wiki/C10k_problem for instance) and much work has been put into addressing it. As for concurrency bugs, the RaceCondition is perhaps best-known; the simplest example of race condition is a statement like "x = x + 5", because in a preemptive multitasking environment such as threading provides, there is no guarantee that both the addition (x + 5) and the reassignment to x will complete atomically, before some other thread happens to change x. Therefore the value of x could end up in error, if the threads happen to clash. This is addressed using synchronisation tools like mutexes and CriticalSections to make certain parts of the code run atomically, without interruption from other threads. However, there is often much complexity in ensuring that sufficient synchronisation has been applied; using too many locks and synchronisers will dramatically slow down the application, since threads get "stuck" waiting for locks to be freed, while too few fail to guarantee correctness. A cooperative multitasking environment, such as that provided by a single-threaded evented concurrency system, guarantees by design that all "simple" code runs atomically, without requiring synchronisation logic, since no other threads exist to interrupt in the middle of a codepath. Only I/O calls break up the codeflow, and they're easily recognisable syntactically and semantically so it is not difficult to ensure race conditions do not form across these boundaries. -DavidMcLean]

The vast majority of applications don't face the C10k problem, nor do they need to run x=x+5 concurrently. Any inter-user value, such as global counts, would be in the database, not the app. I don't know what kind of odd/niche apps you work on, but they don't resemble mine. You are selling refrigerators to an Eskimo here. If you are talking to other readers in other niches, that's fine, but you are not solving MY SHOP'S problems, focusing instead on scenarios that are foreign to us here and now.

Yes, we understand that you don't need this. Others do.

And parallel algorithms are in general more difficult to debug because the order can be different on each run. It's like trying to do science in which you cannot isolate one variable because the other variables keep changing upon each test. HeisenBug risk. We can limit these problems by some extent by making certain assumptions and sticking to certain rules, but we are then accepting down-sides to gain the benefits. If the benefits are small for a particulate situation, then the down-sides are not worth it. Even in theory if two processes/events should be "parallel safe", clients (browsers & GUI engines) are often buggy such that events can cross-effect each other. I don't want to take my chances with potential HeisenBugs unless the payoff is fairly large. -t

I think you'll find the database isn't the bottleneck in your applications. 'Waiting for' the database is. If you use an event-driven concurrency system, you never wait for the database, because there's more important stuff to do. As for difficulty in debugging, the callback-oriented structure means that coding under these systems is actually very similar to coding purely serial code. (Because they're still single-threaded, you don't have to worry about thread safety; many threading-related issues vanish when using single-threaded concurrency, partially because closures keep state encapsulated and local.) However, in evented-callback systems, you can be sure that any particular piece of code is only delayed by the things it specifically depends on; if a query is needed to show page A but not page B, then page B won't be delayed by attempts to construct page A.

Heck, even constructing a single page can have fewer delays. Suppose page A requires several database queries (Plus maybe some other external source. Perhaps some HTML sourced from the Google APIs?). In serial code, you'd make each of those queries in sequence, one by one, waiting for each query to complete before making the next one. With concurrent code, you can make all of those queries at once; databases themselves can easily handle multiple queries at once, so you'll get your results much faster than if you made the queries one by one.

-DavidMcLean

I'm sorry, I still don't know what you are talking about. What is the "more important stuff to do"? Running a Honey Boo Boo video while the report is being generated? If I as a user click "View Report", I don't want to see Honey Boo Boo because I didn't ask for Honey Boo Boo. The button lied. (Don't need HOF's for that anyhow.) The bottom 10% of users get confused and call the help desk if too much is going on at the same time. If this is about making dancing spam more "efficient", I'm not interested in that topic today.

At least in HofPattern, the multi-panel real-time status monitor scenario was something of utility. We just disagreed about whether a JavaScript client should be the reference point for measuring "good". I'd like to get a way from GUI-intensive scenarios if possible because the trade-offs depend heavily on the client technology being used or available, which greatly complicates the comparison. A GUI of some sort is fine in a scenario as long as it doesn't become the focus point of the differences. If HOF's are mostly about making GUI's/UI's "better" in CBA, then perhaps we should spawn a more narrow topic on that alone.

No, stupid. Like I've already explained, basically anything the app needs to do is more important than sitting around waiting for a query. Receiving a request from another client, perhaps. Or finishing off a request from another client, because the results of 'that client's query just came back. Or, to use an example from my previous paragraph that I was sure you'd love, using the time waiting for one database query to make another database query. I have no clue why you're discussing advertising so much; I'm beginning to suspect it's a strawman tactic. -DavidMcLean

What biz scenario would you want the app to do that? Why get report B if I, the user, only asked for report A?

I didn't actually say that, but there's an obvious reason you'd want that to happen: if a 'different user' asked for report B, then of course you'll want to retrieve that report as well.

Use query results caching. Most RDBMS support it. Even if it didn't, I don't see what you are trying to do from a business reason perspective.

… what does query-result caching have to do with anything we've been discussing, in the slightest?

It should really be obvious from a business perspective why you'd want this. It both speeds up construction of a single report and allows for multiple users to request reports simultaneously (or for a single user to request several different reports in separate tabs, which is equivalent from an HTTP perspective). Unless you actually want slower response times from your software, the value of evented-I/O concurrency should be apparent at this point. -DavidMcLean

It sounds to me like a similar issue in the HofPattern topic re the multi-panel real-time monitor screen matrix scenario (AKA: Brady Bunch intro). However, I cannot be sure without specifics. There are different ways to skin the cat, and the choice depends on the domain details/requirements. If we are forced to use JavaScript as the client, then yes we may have to pretty much use HOF's, but that's a client-specific issue and I don't want to explore client specifics/limitations, I want to explore solving CBA problems in a more general sense, not compare browsers to VB to PowerBuilder to Delphi etc. Other than that, I have no idea what the hell you are getting at. You called me "stupid" and I am itching to retaliate at this point. Where's my breathing exercises link? Break your scenario down step-by-step: who, what, when, where, and why. See UseCase. If you want to communicate, roll up your sleeves and do it right. If it turns out your claims are client-specific, then I am bailing out.

The "stupid" comment was in direct response to your suggestion that the only useful thing for an app to do while it waits for queries is play a Honey Boo Boo video. I mean, come on. Why would you jump immediately to something as random and worthless as that, when we'd 'already gone over' a lot of more useful things? It's either stupidity or trolling, and I chose to attribute to ignorance what I could instead have attributed to malice.

I was throwing up a scenario to see if I was getting close to what you had in mind. It's called probing via example. It's a form of "do you mean A?", "No", "Do you mean B?", "No", "Do you mean C?...."
Ah. Fair enough. For future reference, it's usually a good idea to choose a vaguely plausible scenario, rather than an obviously absurd possibility. -DavidMcLean
That's what I'm asking YOU to provide: plausible CBA scenarios. Arrrrrrrg. I DON'T KNOW ANY "plausible scenario" to fit your new thing here; maybe there are none and you are barking up the wrong tree. Entertainment purposes was the only one I could contemplate and had no reason at the time to assume you were not talking about entertainment scenarios.

I don't care what we're using on the client, and it's irrelevant to the topic at hand. Nothing about anything we've mentioned is client-specific. Since we've been talking about Web apps, the client probably would indeed use JavaScript, but there doesn't necessarily need to be any client-side scripting going on in these apps. Note that Node can do stuff other than Web apps: There are libraries for building a more traditional desktop GUI, the ability to access stdin and stdout for writing command-line apps, as well as provision for TCP sockets such that HTTP isn't the only option for servers. It's a very flexible platform, although Web apps are the usual choice.

Concurrency through evented I/O is a general pattern. It doesn't really need to be plugged into specific UseCases to be demonstrably useful; it has already been explained how evented I/O can improve the performance of a report-writing application, however. -DavidMcLean

I'm sorry, I don't see what's explicitly being improved. You seem to be making some unrealistic assumptions. Parallelism alone is no guarantee of speed improvement. That's why I want to walk through a specific scenario. You are being too general and vague. I'm fucking tired of foo/lab examples of FP being great. I want real beef from a real goddam cow!

I already gave a simplified-but-practical example of how Node.js uses evented I/O to achieve improved concurrent performance, using a business-domain application (a report-writer). Did you not understand how it works? I'll try to explain in more detail, if required. -DavidMcLean

That's not a UseCase. There are ways to run multiple threads without having to use (exposed) HOF's on clients and/or servers. You haven't ruled those out. Why are they "bad"? And cranking up the number of threads if the bottleneck is the RDBMS will do us no good.

Because they're multiple threads in the first place, which raises concerns of thread safety, race conditions, and so on. Evented I/O is usually single-threaded (Node is), making it simpler to work with. The preceding description of how Node's eventing system works may be worth another read; if you're still equating it with multi-threading, you haven't really got the basic concept. And the bottleneck 'isn't the RDBMS, as we've explained: It's local app code waiting' for the database.

And the fact that these concurrency systems use explicit anonymous functions isn't a weakness. It's a strength, because functions are very easy to manipulate to do cleverer stuff. For example, retrieving two database queries in parallel to use in one report, a possibility I mentioned above, would be rather convoluted and messy using pure callbacks: You'd need to code up some referencing-counting junk and it'd be annoying. However, because higher-order functions are so general and flexible, you can write libraries to wrap up these sorts of concurrency patterns. In fact, if I wanted to implement the above two-query thing, I wouldn't even 'consider' writing the callback structure myself manually. I'd just load up the async library and do this:


        async.parallel({

        users: makeQuery("SELECT * FROM users"),

        posts: makeQuery("SELECT * FROM posts")

        }, function(err, results) {

        var users = results.users;

        var posts = results.posts;

        // can do whatever you want with these two now

        res.write(aReportMadeUsing(users, posts));

        res.end();

        });

Bam. Two queries performed in parallel, used to construct one report. Tidy and intuitive. It'd be impossible to provide nice libraries like async.js if Node's concurrency didn't use handy things like higher-order functions. -DavidMcLean

Usually an SQL JOIN or UNION is done to "combine two queries". The RDBMS can potentially parallelize multiple sub-queries. Also, multiple different techniques can implement a parallel "makeQuery" function. Bam! Granted many existing web frameworks and languages don't make doing such very easy, but that's likely because the need is not very common. The few times I can recall when I couldn't use JOIN or UNION to get the database to do it, the queries had "lopsided" profiles such that parallelizing them would not double the speed. For example, one may take 500 milliseconds and the other take 50 milliseconds. A non-parallel version would then take 550 ms and the parallel version would take 500 ms (under ideal conditions). That's hardly enough savings to bother in most cases. Optimizing the graphics on the page would probably give the app more of a boost per time spent, and keep the code simpler. Further, if the server is taxed, it may not be able to parallelize them anyhow, and/or they could end up competing for the same resource, such as disk or network I/O such that they end up waiting on each other anyhow. A lot of circumstances would have to line up just right to get a noticeable boost. If you look at the math in context of real systems and real bottlenecks, parallelism is often over-rated for CBA. I vaguely remember one profiling expert saying that as a rule of thumb, in production you get about 20% to 40% of the theoretical maximum of the savings. Thus, if "unstacking" two queries of the same size could in theory boost the speed from 2000 ms to 1000 ms, then the typical actual average would be something like 1700 ms (1000 + (1000 - 30% * 1000)). Spending that unstacking time tweaking with the query statements or indexes may give more speed per programmer time.

And databases are starting to work parallelism into their Stored Procedure languages. See also example "fern01" later.

makeQuery() isn't a parallel function. It'd be defined like this:


        function makeQuery(sql) {

        return function(callback) {

        db.query(sql, callback);

        }

        }

And you still seem to be assuming the bottleneck is the database itself. It's not. The app code that has to wait for database results is. When you use evented I/O like Node, your app code doesn't have to wait for database results. Therefore that bottleneck is reduced. It's fairly simple, really. -DavidMcLean

What else is it going to do during that time? If the user asks for Report X by pressing the Report X button, then the app has to run the necessary query(s) for Report X before delivering Report X to the user. Thus, either the user waits for the database to complete its job, or pressing Report X does something else besides (in addition to) deliver Report X, which would make the button a liar. Thus, it's either Lie or Wait. There is no 3rd option known to mankind. Maybe it can run Seti@Home while waiting so that aliens can answer that difficult question. (Seti@Home is a different app, but maybe you mix and match in weird ways such that your vision of "application" differs greatly from mine. It reminds me of the old joke: "The Emacs operating system needs a better editor.")

[A common idiom in modern custom business applications is to allow the user to queue up a set of desired reports, which generate in the background whilst the user is allowed to interact with the CRUD screens in the foreground. When the reports are available for viewing, appropriate status messages or indicators appear. Of course, you can achieve the same functionality with multiple frames or multiple browser sessions, but the user experience is awkward compared to a single, seamless application. Programming languages that support event-driven concurrency make it straightforward to provide a single, seamless application.]

Some vendors provide packaged report managers for such things to balance the load. But generally this is actually not a very common arrangement because a single or few users can over-load the database server when they suddenly request a bunch of reports. Smarter throttling is needed. Parallelism maxes out quickly because such requests are often competing for the same data and disk resources. If there are a large volume of reports, then the bigger ones are often batch-scheduled at off-hours (night or weekends), and then the user is given a menu of pre-generated reports (usually in HTML or PDF format). A related technique is to pre-summarize data at off hours for the common or slow report or set of reports such that the interactive part of the app mostly is just a formatting service keying on say location or product lines and runs sufficiently fast via "typical" programming. It's roughly comparable to the pre-digestion a parent bird does for food given to hatchlings so they can digest it easier. It wouldn't make sense to have a check-mark selection for such because the ordering of the output may confuse the user: just deliver the report clicked on directly. 80% of the users may get the hang of it, but ticking off 20% is often not the best strategy. Think about it. Keep the UI simple. There are ways to make the interface more friendly to such activity, but it involves more programming. When your request box is full, you don't want to go around GoldPlating every little report menu. Even if you did, HOF's may not be the best/only way to do the GoldPlating. And the "old fashioned" way doesn't preclude running multiple menu items (reports) in parallel anyhow: you can often just right-click to open the web app/page in a new window. Show it to the antsier users. (If every user did it, then we may be back to the over-whelm issue again.) Now I do agree such may be a good sales gimmick: "Look, you can start all these reports at the same time to save the user time" when it practice DB bunching happens anyhow. But one might argue that the domain is actually the act lying, then its a CBA, even though the DBA may murder you. But that's not my personal style.

[NodeJs and similar mechanisms can be used to wrap the database and thus prevent unlimited attempts by clients to simultaneously launch multiple expensive queries, which I'm sure you know is a classic problem with "traditional" ODBC/JDBC-based client/server applications where clients connect directly to the DBMS. Using NodeJs (and similar mechanisms), it is possible for clients to queue up an unlimited number of report requests, but the server can control the number of reports that run simultaneously. In other words, the "smarter throttling" you suggested is precisely what NodeJs (and the like) make easy. Otherwise, the only "smart" throttling you'll get is when the DBMS refuses a connection because you've hit a hard limit in the number of clients that can connect. The usual result of that is an error message sent to the end-user, of course. Ugh.]

Typically you want only about 4 to 8 or so total "pipelines" (threads) running big queries against the database regardless of the number of users. You don't want to fan out parallelism at a per user level. In fact I believe most databases limit the max pipelines such that even if you have 30 app threadlets asking for queries, it's still going to go through those 4 to 8 pipelines. HOF's by themselves are not smart enough to manage the parallelism properly. You seem to be a "parallelism puppy" (comparable to a WikiPuppy). -t

[NodeJs is single-threaded. If you want to control the number of simultaneous connections against the database, wrapping the database calls in NodeJs (or some equivalent) is an ideal way to do it. The "traditional" client-server model where individual clients connect directly to the DBMS via ODBC/JDBC allows potentially unlimited simultaneous attempts to access the database. NodeJs is asynchronous and event-driven -- to avoid stalling the client whilst waiting for processes to complete -- but that is not synonymous with being multithreaded. The same is true of the "Async" calls in Windows 8, the Asynchronous Programming Model in .NET, and various other event-driven callback mechanisms that exist for the same reasons.]

Please clarify "avoid stalling". My apps don't "stall".

[Definition 1: It's the brief (and in some apps, not-so-brief) hesitations that occur whilst your application performs processing or waits for external processing (like a DBMS) to return results during user interaction. You may not even notice these until you compare your app with (say) a well-designed mobile app (or some desktop apps, but mobile developers really focus on this stuff) that's been specifically engineered for "fluid" response.]

Okay, but what else is it going to typically do? Remember my Honey Boo Boo video comment that you didn't like, yet didn't fill in the "correct" activity in place of Honey Boo Boo? Remember, we don't want a button that lies about what it does. Thus if button "A" says it will run Report X, and Report X needs a resource/process that we cannot speed up from the client or (non-query-portion of) app, then we MUST wait. That is the laws of physics. We cannot do anything more to the RDBMS from the client or app side other than submitting the query; it HAS TO go through its paces. There is no other alternative without lying on the button title. Why do I have to keep pointing this out? I don't get it.

You have to keep pointing that out because you're not listening when we answer you. An evented I/O system, once it's sent out a query, returns to its event queue and begins processing the next event. These events are things like "a client has requested report A", "the query we need for building report C has returned", and, when you get clever with libraries like async, "all of the requested parallel queries we need for report Q have returned". You'll notice that there's an obvious course of action for all those events, usually along the lines of "send out queries needed to build the report" or "use the query results to construct the report"; this course of action is represented in code by associating a callback with each event, representing the action the app should take to handle the event. Processing the event amounts to executing this callback, passing in any query results or other dependencies that've been fetched. The only time the system does nothing is when the event queue is empty, i.e., there's nothing 'to do; if the system has only made one query and only needs the results from that one query, then the event queue will be empty until the query returns, so in that case the system will in fact wait. However, it's overwhelmingly likely that a concurrent platform like Node will be used in a situation where it can receive multiple report requests at once, whether from separate users or from a single user through a queuing feature in the app as suggested above. The fact that Node keeps watching for events instead of blocking to wait for a single query, which any database-query call not using callbacks must necessarily do, means that it can handle many requests simultaneously, despite being single-threaded. If your app only ever has one request, ever, then you're going to have trouble seeing the value in evented I/O concurrency precisely because it is concurrency and therefore is useful only when multiple requests are going on. However, any application that needs to handle multiple requests, including the vast majority of Web apps, can earn a huge benefit from evented non-blocking I/O. -DavidMcLean

You haven't demonstrated it for something realistic in the CBA field. It may be interesting from an intellectual or gizmo-gee-whiz standpoint, but I want solutions when at work without convoluted popcorn code. I'll play with screen toys at home.

I've already demonstrated exactly how evented I/O works for a reasonable approximation of a report writer. If you don't believe report writers are custom business applications, please clarify as to what a custom business application is. -DavidMcLean

Nothing about your approximation is reasonable.

This is a complaint we haven't heard yet. In what ways is the example presented not a reasonable approximation of a report writer? It makes database queries and uses the query results to construct HTML, passed back to the user; granted, it's only producing a very simple report at the moment. There's no need to bloat up the example with more pieces of report functionality, because it demonstrates equally well how evented I/O works in report writers using this simple one, and adding more lines of report would just distract from what's actually being demonstrated by the example. -DavidMcLean

It's not clear what business need/scenario is being addressed; what realistic problem is being "solved", and it's also not clear the scope you are looking at such as per user parallelism, per server, per database, etc. You are showing narrow parts of an engine but being elusive about what kind of vehicle it's going into and the kind of person driving it (casual versus racer, etc.).

The problem being solved is "we need a report". I see no value in making it more specific ("we, a company that do XYZ, need a report for QRS"), because making such a change won't actually alter the demonstration at all. It's a reasonably general design that'll work fine for most any reporting. (Stuff like throttling the maximum concurrent database requests, as discussed elsewhere on this page, would be implemented through modification of db.query, so the shown code still doesn't need modifying in that case.) As for scope of concurrency, I'll admit that's a valid complaint. Let's say "it must serve as many users concurrently as the server's processor can possibly handle"? I'm not great with numeric metrics, but it's hard for one server to be any more concurrent than that. -DavidMcLean

Keep in mind that current web servers may span multiple server "boxes" and also support multiple different languages. Reinventing the wheel may not be as simple as you first envision unless you restrict it to certain assumptions, which may count against it in the market-place. Again, things like web servers and RDBMS have served us relatively well for more than a decade. You don't toss such road-tested (semi) standards willy nilly for the shiny new FP thing on the block. It has to earn its reputation via test sites and test implementations. When it survives well for at least 6 years, then maybe I'll take a second look. But I don't want to be the guinea pig today. Find another volunteer.

Node.js was created in 2009, so there's no luck meeting your cutoff date there (at least not yet). However, Twisted, an evented I/O platform for Python using many of the same ideas, was initially released in 2002 and is still useful, powerful, and under active development (the most recent stable release was December 2012). Evented I/O is hardly unproven technology; it's earnt a reputation, and most of the programming world already accept its value, so we don't need Guinea pigs to trial it. -DavidMcLean

Can you supply a review by somebody with publication credentials that Twisted is a good general purpose replacement for web servers (Apache/IIS)? I don't dispute it may do well for special niche projects.

Twisted isn't intended as a general-purpose replacement for Web servers, and it hasn't really had much work put into that functionality, so it actually won't be very good at that sort of thing. Another evented application, however, 'is a good general-purpose Web server: It's called Nginx, it was released in 2004, it's actively developed, and thanks to its eventing model it's often much faster and more concurrently capable than Apache. -DavidMcLean

Well, we cannot really compare speed benchmarks on this wiki. And like I keep saying, the vast majority of my performance profiling shows the database to be the bottleneck (unless I made a dumb mistake somewhere, which is usually fixable). Faster web servers (excluding the database) are low on my wish list compared to a good many other things. I don't care about that topic much. It does not interest me from the practical side and I won't discuss it further.

And like I keep saying, the database stops being the bottleneck if you apply evented I/O to your application. -DavidMcLean

After-Note: below it's becoming clear they are talking about replacing web servers like Apache and IIS, not just a different way to code the application side. This was where a lot of the confusion above came from: I thought they are talking about a different way to code applications, but their scope is larger than that. Communication lesson: make your scope clear up front. Make sure both sides know what forest is being talked about before dissecting trees. -t

You appeared to be assuming I was using a single-threaded web server, which is not the case. Web servers can process multiple requests from both the same user (such as a diff web page) and different users at the same time and have been doing this for almost 2 decades. If user X requests report Y, and report Y takes the database (at least) 7 seconds to process the query of report Y, there is not much that can be done to make the total wait less than 7 seconds. The web server (non-DB app part) typically will take a tenth of a second to process (and much of that is often network latency), such that the total time will be about 7.1 seconds if the servers are not heavily loaded by other processes/users. Even if you found a mega-efficient replacement for the web server (which I'm skeptical), the user is not going to notice a difference between 7.1 seconds and 7.03 seconds. My time as a developer is better spent tuning the database portion of this or another app, not shaving off of 0.07 seconds from report Y's app side. It's not a good allocation of developer time. As a general rule, you spend time tuning/fixing the bottlenecks, not the NON-bottlenecks. That's a newbie error. It's rarely economical for developers to spend time on non-bottlenecks. (Query and database tuning is another topic.) Also note that maintainability & readability often should override minor speed enhancements. Making report Y's app and/or query code into spaghetti code just shave say 10% off its run-time is not smart unless it will be used heavily or by a good many users. A lot of custom apps are not heavily used (which does not necessarily mean they are not important products/results). -t

Databases can't be optimised beyond a certain point for basic computational complexity reasons; since fundamentally the size of that bottleneck can't be reduced, it's a much better use of one's optimisation time to avoid the bottleneck entirely. Spawning a new thread for each connection does allow the server to keep operating on other connections while waiting for a database query. However, employing eventing such that the fairly-constant query time from the database doesn't require a thread or process to sit around waiting is a significantly better setup, simply because threads and processes are big. Apache's model of spawning a separate thread for every connection might 'work, but it's hardly very efficient in terms of memory and processor usage, and it's well-known that this model can't address the C10k problem (that of servicing ten thousand concurrent clients with one Web server). It's also well-known that Nginx, by using eventing instead of separate threads, has a much lower and more consistent memory footprint, has reduced processor use, and is able to handle C10k without significant delay. -DavidMcLean

It's not clear to me whether you propose changing the architecture of the web server or web applications, or some combo of both. I would note that it's fairly easy to cluster/divide web servers if one box is not sufficient. The shared-ness bottleneck is the database and possibly "permanent" files and source code (if used heavily by the app). User "sessions" can be split between multiple servers by random allocation and/or geographical/IP-range splits. Thus, even if you are right about servicing bunches of users with a single box being a problem, it's mostly moot because one does not have to stick with one box using traditional web servers. However, for most CBA apps, one is using a database (or something equiv) such that it becomes the bottleneck again. I still say you are solving the wrong problem. Web-based apps that don't save and don't share anything lasting are not very useful to most businesses, except maybe some kind of trivial game or shared multimedia "event" where persistence and reliable sharing are not an issue. It probably exists somewhere, but fairly uncommon. The "app RAM/exe" side is just plain not the bottleneck I encounter. The bottleneck is coordinating the sharing and saving of info between and among users because that's where the contention for single or few resources is going be. Non-persistent per-session info can be scaled out to multiple boxes because those temporal resources are not shared. It's just "working memory" of sorts and can be "stuffed" just about anywhere.

It can be roughly compared to baking a big cake for the king. You can split the parts of the cake up and have 100 different kitchens baking the parts of the cake, but when it comes to assembling the final pieces together for the final cake, then we have a bottleneck because only so many cooks can fit near the cake to assemble the pieces. Databases are kind of like that because any shared piece of info has to be keep in a single bin or "cubby hole". Only so many people can be accessing that bin at the same time and the computer has to manage multiple read/write requests for that one slot. (Replication is available, but creates it's own set of trade-offs.) Anything that doesn't have to be shared can usually fairly easily be scaled out to multiple machines.

Databases may be distributed and load-balanced across multiple servers, just like Web applications may be. Using more concurrently-capable servers, like Nginx, means that each server box can get more requests handled, which in turn means that fewer server boxes will be needed to service the same concurrent count of users; this obviously saves money for a business. The same principle applies in app code, where the use of eventing will allow for concurrent usage of a Web app with fewer app servers than a threaded approach would necessitate. -DavidMcLean

If you want to re-invent web and database servers, knock yourself out. Extra points if they work with existing web apps and the web server can run multiple languages (like PHP, Java, C#, etc.). Databases are more difficult to split up than web servers, especially if they have a high transaction rate and have ACID active because all those transactions and related state have to be kept in sync across servers, which often means that the different boxes end up waiting (blocking) on the same things more or less anyhow. Work some samples on paper and you'll see how sticky it can get to parallelize high-transaction DB's. It's one reason why Oracle is trying to make DB-centric hardware because throwing multiple off-the-shelf servers at the problem has not worked out very well. It's a good sign you are thinking more about improving DB performance instead of the app side because that's the practical area that's hurting for breakthroughs for typical CBA's.

I don't really get why you keep talking about me "re-inventing" existing things. I'm not doing that. I'm pointing out things, such as Nginx and Node.js, which already exist (and both start with N. I wonder if that's relevant?). For reporting, as we've been discussing for the most part on this page, databases are easy to distribute across multiple servers because a report only requires selecting existing data from the DB; there's no worries about concurrent editing, locking, and so on when access is read-only. When ACID and transactions are involved, the database does become harder to distribute, of course. Applying cheaper evented concurrency to app and Web servers remains valuable, however, because of the point I made in my previous paragraph: An evented server can handle more requests than its threaded counterpart, therefore fewer servers are needed, reducing costs. -DavidMcLean

The things you mention are relatively rare compared to web servers like IIS and Apache, and to RDBMS such as Oracle, MS-SQL-Server, MySql, etc. These are de-facto "general" CBA standards for these kinds of servers (their app "interfaces" are similar across vendors). If you want to claim your HOF-centric versions are better replacements/alternatives, then let's compare them on a FULLER set of aspects, not just speed claims. (Please start a new topic on such if you wish to go there.)

[Note that nginx is hardly "relatively rare". See http://www.cmswire.com/cms/web-cms/open-source-web-server-nginx-passes-iis-in-popularity-014050.php Neither nginx nor Node.js are substitutes for SQL DBMSs. The former is an HTTP server, the latter an application server that works with DBMSs, not instead of them.]

I generally found in reviews that nginx is not recommended for dynamic content when compared to Apache. Also, IIS tends to be used for intRAnets far more than Internet. Such surveys cannot see into intranets. That's become IIS's niche, more or less.

[The point is that nginx is not "relatively rare". By any measure, it's popular.]

It's true that Nginx alone doesn't work that well for dynamic content, compared to Apache, by the way. This is because it's not possible to make existing app code magically non-blocking; Nginx's built-in static file serving is non-blocking, hence its utility for static content. However, Nginx is great at reverse-proxying to app servers without sacrificing its non-blocking nature; for traditional apps, the app server can run Apache. Newer apps might also use a non-blocking model to gain the same benefits for dynamic code as Nginx provides for static files. -DavidMcLean

For somebody who keeps finding the database being the performance bottleneck, you haven't made a good case for tossing traditional web servers for dynamic applications.

[Actually, I have. To reiterate: Using evented concurrency, such as in NodeJS, eliminates any and all external performance bottlenecks, including database access. -DavidMcLean]

You haven't provided a sufficient demonstration and/or explanation. I know you believe such personally, but you have not communicated your thought process to the reader well with realistic cases/scenarios. Further, if the database is taxing the hardware, then there may be nothing the event manager can do to avoid that.

[I went through the mechanics of evented concurrency, using a demonstrative application, right near the top of the page. I reiterated how it achieves good concurrent performance several times, including with an analogy, later on. Was it unclear to you, from the demonstrations and explanations I've already presented, how evented concurrency can eliminate external performance bottlenecks such as database access? -DavidMcLean]

You did not do it in terms of realistic CBA's. Create and dissect a realistic application(s) with time-lines etc. You guys did something close with the "dual query" scenario (for which I proposed a "parallel" block as a more intuitive solution), so you've shown you are at least capable of such with enough prodding. You know I am not going to be satisfied unless the "improvement" is shown with a semi-realistic CBA scenario, so please stop claiming you "did it" until you have satisfied that step. Otherwise it looks like a solution in search of a problem. And I believe you to be naive about databases and parallelism; however, this topic is not about database performance and I'd suggest you start a different topic if you wish to talk about database performance in particular. If you have a newfangled magic formula to speed them up, I'm sure database co's like Oracle and SyBase would be quite interested. --top

[I didn't use a specific CBA example to illustrate the mechanics of evented concurrency because doing so is unnecessary. Using a simpler example, as I chose to do, still demonstrates how eventing works, without distracting from the actual point of the illustration with completely irrelevant domain details; as I've said earlier, that's what domain details would be, to the example: totally irrelevant. -DavidMcLean]

Claiming it's "totally irrelevant" doesn't make it so. Demonstrate that by covering databases also. I can claim pigs fly, but until people actually see it happen, few will believe me.

[No, it is totally irrelevant. I described the example as "a report writer". It makes absolutely no difference whether it's a report writer for reporting on warehouse inventory, or reporting on sales trends, or reporting on employees' favourite colours. -DavidMcLean]

Hogwash.

[How? The example I give is applicable to any kind of report that can be expressed as database queries, and it's not going to be affected by exactly what sort of data it's querying. -DavidMcLean]

I don't disagree that HOF's can potentially help add or manage concurrency to a wide variety of things. The issue is whether it's the "best" approach, including WetWare issues.

HOFs don't "add concurrency" any more than any other function. They can, however, make isolation easier to manage in the presence of concurrency. As for being the "best" approach, in the cases where HOFs are appropriate, they are unconditionally superior to their alternatives. That includes WetWare issues. As has been often noted, learning HOFs is a matter of a few hours work -- probably the same effort needed for an experienced 'while' loop programmer to learn how to use 'for' loops.

I changed the wording a bit. Anyhow, it's not just a matter of following a pattern to use it, it's also the potential misuse. See PageAnchor: Abuse294 at TopOnAbstraction.

[Have you actually seen code misusing higher-order functions to the point of incomprehensibility or opacity? -DavidMcLean]

Yes. The documentation said something like, "include this code here if you get a weird error message", and the weird code had odd HOF's. The JS component didn't work and so I added the "funny code", but it still didn't work, although I did get a different error message. It wasn't necessarily malicious, but probably sloppy package design. Its meta-guts were hanging out.

[Intriguing. Care to provide a link, or the code snippet in question? -DavidMcLean]

It was a while ago. If I come across it, I'll let you know.

[Alright. That does however leave us with no actual evidence of code misusing higher-order functions to the point of incomprehensibility or opacity. (It's entirely possible that example's weirdness isn't related to its use of higher-order functions, as well as that it merely seems weird due to your personal unfamiliarity with higher-order functions, so unless we can actually see it it's not a valuable example.) -DavidMcLean]

The so-called "Brady Bunch" example is realistic. We have invited you to re-write it without HOFs and demonstrate that your non-HOF approaches are at least equivalent; you can even assume setInterval doesn't require a function argument, or replace it with something else. Anyway, node.js and other event-driven I/O environments don't speed up queries on the DBMS side. What they do is make it trivial to avoid idly waiting for query responses from the DBMS. An application built on event-driven I/O can launch a query and immediately return to responding to user input. This means, for example, multiple reports can be launched whilst allowing the user to continue editing data, whilst simultaneously allowing status messages about external activity (e.g., calendar or alarm items) to pop up in an integrated fashion without the overhead of explicit polling or the complexity of multi-threading.

Can you honestly claim that HOF's would do the Brady Bunch grid better on all possible client architectures? It's easy to cherry-pick using specific client fit to design styles. I don't care about client-specific benefits: that's vendor/language-centric shit. That's not important to me: I'm looking for concepts, not vendor-fits. JS/DOM is is many ways crippled anyhow. And this is not the BB topic. Stop cross-bleeding your summary claims. And the Brady Bunch example does not expose any known database-centric issues.

I don't claim any would do better on 'all possible client architectures'. I do claim that on any given client architecture where HOFs are available, there are an infinite number of solutions where using HOFs would be superior in every way to not using HOFs. There are no database-centric issues. In other words, there are no issues specific to databases that aren't merely a specific case of client-server issues, which are themselves issues of distributed computing. The Brady Bunch example handily illustrates realistic client-server behaviour.

I disagreed as explained in BradyBunchGridDiscussion and GreatLispWar. I won't repeat those arguments again here and I hope you won't either. Let's practice textual OnceAndOnlyOnce discipline. It's good for you, like broccoli. If you claim such techniques can speed up databases, then please provide a database example.

You disagreed with what? 1. That there are an infinite number of solutions where using HOFs would be superior to not using HOFs? 2. That there are no database-centric issues? 3. That database-centric issues are merely specific cases of general client-server or distributed computing issues? 4. Or, that the Brady Bunch example handily illustrates realistic client-server behaviour?

Your way to measure "superior" differs from mine because I factor in human WetWare and you don't seem to. Or at least you feel rank-and-file programmers can handle high-end abstraction techniques and I feel they can't and we don't agree on that, but there is NO FUCKING REASON to reinvent that debate HERE.

[That debate's been floating around here for a while, Top; you've definitely been arguing that rank-and-file programmers aren't going to understand callback-based APIs, anyway. We're considering WetWare too, incidentally. It's just that, as has been pointed out on other pages, every other programmer besides you seems to find higher-order functions perfectly reasonable and non-confusing given a few hours of explanation and training. -DavidMcLean]

You've done nothing even close to a reliable and objective survey of such opinions. And keep in mind, it's also the misuse of such techniques to consider.

[Certainly, but you've not done a survey either, and my claims are corroborated by at least one other wiki contributor. As for misuse, any language feature can be misused. Global variables lead to big problems with code coupling. Is that a reason not to have variables in your language? -DavidMcLean]

I cited a blog comment on Ruby on Rails, so it's still 2 anecdotes against 2 anecdotes.

[Rails isn't the same thing as higher-order functions, and your Rails blog post was made in 2006, shortly before Rails had a huge impact on every subsequent Web framework. -DavidMcLean]

Anecdotes are week evidence either way. It's comparing dust to dust. And Rails only exists because the HtmlStack sucks rotten eggs. When Internet GUI's are properly commoditized, it will die away. Rails is a symptom, not a solution. Plus, it does rely fairly heavily on meta-programming techniques even of those are not HOF's itself. It's one of the reasons Rails doesn't port well to other languages.

["Rails only exists because the HtmlStack sucks"? Hardly. Rails exists because developing applications, Web or otherwise, isn't trivial, so we build tools and frameworks to cut down on development work for individual applications. Nothing that Rails does will be addressed by your commodity GUI behaviour set, thus it's absurd to think it'll die away when such a thing exists. It certainly does rely on metaprogramming techniques, because it effectively builds up embedded domain-specific languages for describing models, views, and controllers; this fact has minimal impact on the app designer, though, who's never exposed to the metaprogramming tricks and techniques involved in setting all this up (unless the app designer happens to become a Rails framework developer, of course). -DavidMcLean]

Yes, as it matures, the meta-ness gets more hidden from the app developer. At a point somebody may realize that the best language for building apps is not the best for building an app-maker framework, especially with fungible staff, and rework it into something more pre-packaged. But, we are wondering off topic here. This is topic not about RoR.

[Indeed. Back to the question of misuse. Do you believe it reasonable to remove variables from the language because they can be misused, producing overly-coupled code? -DavidMcLean]

It's a matter of the ratio of utility to abusability. Of course, variables can certainly be abused, but there are not many viable alternatives to them, unlike HOF's. HOF's add only marginal utility but non-marginal abusability.

[On the contrary, some language designers found the abuse of variables bad enough to eliminate them entirely; this is the basis behind purely functional languages such as HaskellLanguage. There are some alternatives to higher-order functions, which may or may not be viable depending on situation (their insufficiency in many common applications of higher-order functions has been discussed on other pages, so I won't go into it further). However, are there any alternatives to higher-order functions that can't be abused in the same ways higher-order functions themselves would be? -DavidMcLean]

Haskel is very different from common production languages. Familiarity alone may be a heavy headwind.

[Perhaps. You haven't answered my question, though. Are there any alternatives to higher-order functions that can't be abused in the same ways you think higher-order functions can be? -DavidMcLean]

Everything is abusable. However, some abuse stands out more than others. For example, heavy use of Eval's will tend to stand out (raise yellows flags) without knowing the details of the code. (I've rarely used them heavily in a given app, haven't found such a need.) It's hard to tell a good HOF from a bad one without knowing the details. Plus, Eval has no "intellectual bragging points" associated with it. Developers want to feel high-brow, even if they are making an intentional or unintentional mess. -t

Developers want to "feel high-brow, even if they are making an intentional or unintentional mess." Huh???

Yes. HumansSuck. Egos and BS rule.

I think you're treating water-cooler bravado as a reflection of reality. I've seen weak programmers, but I've never seen developers use a feature just to "feel high-brow". Are you sure that's not your BlubParadox-like interpretation of the intent of programmers who are making effective use of features that you don't like?

I've been chewed out for using techniques that may "confuse other programmers" in multiple organizations, such as function wrappers around otherwise repetitious code patterns. I've learned to keep it somewhat dumbed-down (if I want high praises, which sometimes I skip to "do it right"). Again, I can accuse you of ulterior motives also if we play that game.

[You haven't been accused of ulterior motives. The suggestion is that you're merely interpreting other programmers' intents as attempts to "feel high-brow" subconsciously due to BlubParadox; this runs along similar lines to the concept of confirmation bias. -DavidMcLean]

You are human just like me; made of the same stuff. If I can do a self-check and not find any parity errors despite having some actual flaws, then so can you. We'd like to think we are objective, but perhaps really are not, and our position in the world and careers shapes internal biases in ways we don't realize. Related: HumansAreLousyAtSelfEvaluation.

[Indeed. I'll let a great quotation from HofPattern be my response here. Could the same principle potentially apply to me? Could I fall victim to the same human weaknesses that all humans do? Could I be blinded by my own biases or preconceived notions, or be limited by comfortable familiarity, or let laziness cause me to roll back from steep learning curves and lose out on something helpful? 'Absolutely! It's why I continually read about new programming techniques, new development processes, and new programming languages, and try them whenever possible. It's why I continually revisit existing techniques, processes and paradigms to see if there's something I might have overlooked. It's why I search for and read published summaries of research into all aspects of SoftwareEngineering. It's why I constantly reflect on and re-evaluate my own programming, and ask myself whether my code is as readable, as maintainable, and as efficient as it could be. -DavidMcLean]

Perhaps you should also read about psychology, business, sociology, and economics. The issue above was not entirely a technical one, yet you went right to a tech-centric answer. That's telling. It's about subconscious motivations, not "who knows more".

[I'm a programmer. What do you expect? -DavidMcLean]

We should all strive to be more well-rounded (not physically).

[For a less sarcastic response, my answer is only tech-centric because it's a quotation stolen from a tech-centric context. The point behind it, though, is that yes, anyone is susceptible to BlubParadox, not just you, and that that means we have to put effort in to ensure we don't get blubbed. We suggested your motivations may arise from said paradox, and you responded by pointing out that we're human too; the quote accepts that we are but argues that we have to do something about that, effectively. -DavidMcLean]

One can be "blub" about WetWare, business, staffing, economics, etc., not just technology.

[How? The BlubParadox is that, although one can easily recognise technologies that are less powerful than what one's used to, one is likely to view more powerful technologies as equal in power to one's usual technologies but weird. How would that paradox apply to any situation in which there's no features that can be partially ordered by their power? -DavidMcLean]

Power? I agree Lisp is powerful, so that's not really the issue. The issue is how typical staff deals with such tools. It's not powerful enough to overcome practical staffing issues. Otherwise there'd be tons of big or fast growing Lisp-centric companies (IfFooIsSoGreatHowComeYouAreNotRich). The proof is in the pudding. If it was significantly better, somebody would have leveraged Lisp since it's inception 50 years ago and kicked everybody else's butt in the marketplace by making it cheaper, quicker, and better. The only thing close is YahooStores, and they eventually replaced most of the Lisp with more mundane languages as the company matured. (Buddy or lone start-up RAD is probably where it does its best.)

[Um… yeah, Lisp is powerful. I didn't mention Lisp, though. Why are you talking about it? -DavidMcLean]

That's mostly what the coiner of the term talks about.

[Yes, PaulGraham is an advocate of Lisp and probably considers it the topmost language in the BlubParadox language-power ordering. But that's not actually at all relevant to the notion of BlubParadox itself, at least not in this case. Your Lisp-centric response is non-sequitur. -DavidMcLean]

This sub-debate is going nowhere, regardless. Let's return to realistic evidence.

[Alright. What realistic evidence do you have that developers might use higher-level features merely to "feel high-brow"? -DavidMcLean]

I only have anecdotal evidence about such developer behavior. If your observation is different, so be it. Anecdote versus anecdote, and then LetTheReaderDecide and let it stand as-is.

[It's not the anecdotal discussion of developers using higher-level techniques that's under scrutiny, though. It's the possibility that your interpretation of other developers' use of higher-level techniques is biased in a BlubParadox fashion. Can you be sure that's not the case? -DavidMcLean]

They do it with many other things, not just HOF's. And I can pretty much tell that they are full of it because I ask them why their goofy approach is better than a normal one, and they often often cannot justify it beyond citing an authority. Further, the frequency generally goes up and down with fad cycles. During fad X, the nuts or resume-padders get X-happy, and during fad Y, the nuts & RP's get Y-happy. After X dies down, you don't hear X-related claims much anymore.

If you're looking at their work with the presupposition that it's a "goofy approach" and they're "full of it" and "nuts or resume-padders", then I'm not surprised they try to dismiss you and your unpleasantly closed-minded attitude by citing an authority. Some of them are perhaps hoping the authority will enlighten you in a way that they cannot, but your classically BlubParadox reaction ("goofy approach", etc.) obviously gets in the way.

I'm usually more cordial to them than I am to you; for my paycheck there is tied to harmony. Sometimes they have pretty good explanations. I would hope that people think through the reasons for using a given technique rather than "it's in style". I respect people who have reasons and can articulate the reasons for doing something in a practical sense. One person said something like, "I don't know if it's really better, but we only grow by trying new things and learning". As long as they don't over do it and carefully comment their "experimental" code, I'm usually fine with such an approach. They are honest and I appreciate that. They don't make up unverifiable gobbledegook. --top

Incidentally, Graham, the coiner of the term "blub", has given a metric: code size. However, I place "readability by peers" as a more important fact than mere code-size. If I write code that they can't understand even if it's short, I consider that a "fail"; I haven't done my job right. -t

[But are you judging readability of a technique's use from the perspective of unfamiliarity with that technique? Any unfamiliar technique is going to seem weird and more difficult to understand than a more familiar one. -DavidMcLean]

My unfamiliarity, or that of typical developers I encounter? Of course familiarity matters. Ideally we would know everything 100%. If we live 10,000 years we can all master all of IT.

[Readability of using a given technique should be judged assuming basic familiarity with that technique in general. No technique is totally intuitive; even 'if' statements must be learnt before code using them is going to be at all readable. Judging readability, assuming basic familiarity, allows you to judge a given usage based on its applicability to the problem at hand and how idiomatic a usage it is. -DavidMcLean]

Your "should" is unrealistic idealism. Their knowledge level is what it is. I can personally give those around me lessons (if they bother to show up), but that doesn't scale to the entire IT world. Approaching this issue from a general perspective, not just my neck of the woods, I match tools to the environment as-is, not as it should be because the universe doesn't care about SHOULD's and there's no Johny Applehof to go around and educate 500,000 developers. Don't make HOF's into a religion.

[My "should" wasn't idealism at all; it was a recommendation. If you judge the readability of a technique from the assumption of no familiarity, every single usage of that technique is opaque and impossible to understand. If you judge the readability of a technique assuming basic familiarity, usages may individually be characterised as good, well-abstracted, idiomatic, and so on, or as bad, abuse of the feature, unnecessarily confusing, and so on. It's obvious which form of judgment is more useful. -DavidMcLean]

Of course, if you don't have sufficient knowledge about something then you don't have sufficient knowledge about something. The issue is whether it's important enough to have little shop training sessions on it, being that there is probably a long list of many important and useful topics to cover competing for mental space, both domain-related and general IT related. My "recommendation" is to use objects or something else instead unless these HOF competitors become too cumbersome, in which case the code will then be reviewed for alternatives, including possibly HOF's. I've maintained all along, show me a reason to add a different kind of screwdriver into our tool kit based on case studies/scenarios, and I'll consider the merit as far as what I recommend for org's. Somebody rejected the idea of using ACTUAL scenario/situation frequencies (or the best approximation of) to allocate our tool packing, which I find ridiculous.

[If you're advocating passing FunctorObjects defined on-the-fly as a replacement for higher-order functions, that necessitates almost exactly the same training as using higher-order functions directly would. One still has to understand the semantic concept and implications of passing a block of code to a function in either case, because the two are semantically and conceptually equivalents. The only difference is in syntax, and teaching syntax alone, on the whole, is easy. In fact, if one understands passing objects around, in many object-oriented languages one already understands passing functions around, because functions are a kind of object. -DavidMcLean]

That's my point: if the only real difference is syntax, then why bother? Objects have other benefits, like the ability to add new attributes without changing existing calls. Yes, they may be a bit more code, but that extra infrastructure makes potential room for other things in the future such that the cost of verbosity is outweighed by the stated change-reduction flexibility plus the familiarity factor. Or at least it's a wash such that devoting 15 topic pages to it isn't worth it and we are both just being stubborn anal pedantic idiots.

[Because the syntactic overhead of a more verbose approach actually quite significantly weakens the utility of many higher-order function patterns. I've mentioned, several times, that the concise nature of Ruby's block syntax is hugely important for Enumerable#each to be as useful as it is. Domain-specific languages (RSpec would be an excellent example) constructed in terms of higher-order functions just aren't viable in languages where only FunctorObjects are available. And so on. In addition, proper anonymous functions can reduce the number of NonOrthogonalLanguageFeatures. In CoffeeScript, for example, the anonymous function syntax is so clean and concise that named functions and methods actually aren't needed at all, or even a provided component of the language syntax. -DavidMcLean]

Ruby-style collection-centric languages are discussed in GreatLispWar and DatabaseAbstractInterfaceDiscussion. I won't repeat those arguments here. I'd like to see a domain-specific DSL example outside of collection/table processing that HOF's make truly better. We are right back to square one: a dearth of CBA examples.

[Is there a problem with RSpec, as an example? It's completely unrelated to collection processing. -DavidMcLean]

RSpec? I can't seem to find a realistic CBA example so far. It seems like Yet Another Weird Language to learn. Let's focus on app code itself, not testing platforms etc. You guys keep wondering outside the original premise: typical biz app software design. Not hardware, not server architecture, not GUI guts, not testing platforms.

[If you don't think writing unit tests is part of typical biz app software design, I am very very glad I won't ever have to maintain your projects. As for RSpec being Yet Another Language to learn… of course it's another language, but it doesn't take much learning, since it's merely an EmbeddedDomainSpecificLanguage in Ruby. Finally, we've presented a wide range of stuff that you've rejected: algorithms (HofPattern), database and other querying (this page and TopOnPreparedStatements), collection management (Enumerable), interfaces (BradyBunchGridDiscussion), and unit tests (this new example now). If your typical business applications don't have any algorithms, queries, collections, or user interfaces, what do they have? What else is there to look at? -DavidMcLean]

I didn't say "don't test". You are putting words in my mouth. As far as your other examples, I'll let those topics speak for themselves rather than reinvent those arguments here.

[You did say that tests are "outside the original premise" of "typical biz app software design". -DavidMcLean]

Yes, but that's not the same as saying testing is "not important". Either way, I'm not interested in exploring testing frameworks at this time. If you are, so be it, but I won't participate. I'm interested in seeing how HOFs improve the "domain code" of typical CBA apps.

[I didn't say you said testing is "not important". ;) As for "domain code", my question still stands: If your apps don't have any algorithms or user interfaces, and you don't want to consider collection management any more for some reason, what do your apps have that we could consider? -DavidMcLean]

Strawman baiting. You haven't defined your working definition of "algorithms", demonstrated "better" for collection management, for all user interfaces, etc. At best you've shown it better for a specific client that has only one built-in timer operation that happens to use HOF's in it's interface. And it has a memory leak in some browsers.

[Okay, but I'm not claiming now that those things are better. I'm merely pointing out that you've rejected the domains of HofPattern-type algorithms, collection management, and user interface scripting as examples of "typical custom business app design". From above, "You guys keep wondering outside the original premise: typical biz app software design. Not hardware, not server architecture, not GUI guts, not testing platforms." You've decided that those things aren't within the domain of custom business app design. If they aren't, what is? I'm legitimately at a loss. -DavidMcLean]

You are mixing things up. You haven't shown "HofPattern-type algorithms" actually doing CBA things. As far as collection management, you've haven't shown it better than the alternatives regardless of whether it's CBA-related or not. To qualify it has to be CBA-related and be better than the alternatives. Your collection thingy fails the second (at least). And your UI claim is specific to a kind of client that happens to use HOF's in its root API's. It's not a general illustration of HOF-based UI abilities but rather saying an X based client works better with X's and a Y based client works better with Y's, which is a UselessTruth. And you guys are hypocrites on the memory leak. You say, "Oh, that's just one brand". So the benefits are client-specific, but so are the down-falls, but you want to count the client-specific upfalls but not the client-specific downfalls. You can't have it both ways.

[Certainly, I haven't shown examples of indivisible algorithms that require customisation being using for custom-business purposes; recall that there's an example on the way of doing that (employee scheduling), however. You've chosen to exclude collection management as an example, for some reason, so it's no longer possible for it to qualify. Finally, using higher-order functions to build a user interface isn't client-specific. To quote TopOnAbstraction, "HOFs are a general concept. So are VIEWs. Relying on a concept that is available across implementations results in more generic, re-usable code than creating specific solutions for each client." The fact that Firefox 18 had a memory leak under certain circumstances, by contrast, is client-specific. (IE's tendency to click when given the BradyBunchHeadExample design is also a client-specific issue.) -DavidMcLean]

Employee scheduling has not actually been demonstrated. I suspect it would end up "fractured" like the usual CBA pattern if actually implemented with realistic behavior. And I did not exclude collection management, but merely suggest such a language try to embrace SQL instead of appending dots and hash-tags to expression builders for the reasons given. And there is nothing wrong with client-specific solutions, for GUI's are going to be client-specific anyhow. HOF's would only bloat up such reuse-intended code because a different GUI may not use HOF's.

Employee scheduling is on its way, but I've not had time to finish it yet. It has "realistic behaviour" to the extent that it was used in production, which is why it's taking a while to prepare -- I have to remove all the application-specific cruft. It's not "fractured" because what HOFs (and FunctorObjects, which is what it originally used -- there are no HOFs in Java) facilitate is writing algorithms without having to "fracture" them. That's precisely what HofPattern shows.

Bad developers make bad code even in good languages; good developers make good code even in bad languages. HOFs won't turn good coders bad.

Giving an automatic weapon to a semi-psycho is not a good idea.

Pointers in C are like an automatic weapon -- misuse of them can result in subtle, erratic, and difficult-to-find errors. HOFs are just functions, no more dangerous than any other function.

Yes, you are absolutely right! And that's one of the reasons why C is less common for CBA compared to MS languages, Java, Php, etc. You've help make my point, thanks!

[Pointers in C are a feature operating at a low level of abstraction, which would suggest that higher levels of abstraction are preferred in custom business applications. -DavidMcLean]

Goldilocks.

Huh?

It's a matter of finding the right level of abstraction to fit typical developer WetWare. A middle-level seems to be the best fit in terms of maintainability smoothness among fungible staff in my observation. The wrong level of abstraction causes problems in both directions. That's just the way I see it going down. If your observations vary, so be it. It happens.

As far as scaling databases, the transaction level varies per report and database design, and even if a report is only reading, it still has to contend with transactions taking place during the reads. If it wants a clean snapshot, then the transactions have to be consistent such that we don't have "orphan" info, link mismatches, etc. that ACID normally helps prevent. A one-to-many relationship may cause blocking on 2+ machines such that read transactions are waiting for that same transaction set on both machines. Even if it is a low transaction table/DB, most reports on the same data will be looking at different aspects. If we partition the data by location, then it may not help time-based queries as much. In some ways splitting can even slow things down because rows may have to join ACROSS machines, which is usually slower than same-machine joins. A web server will generally partition by session and/or user and doesn't have to worry much about cross-user references (cross-user info usually goes into the database side). Essentially a single splitting aspect (user) can be found for web servers that works well in most cases. Databases not so much because its relationships have more dimensions/aspects. As a general rule, splitting up machines by a given aspect will make activities that are limited or partitioned on that given aspect faster, but activities that are split/shared on a different aspect will be slower because then the processing has to hop between machines instead of find everything on one machine. There are techniques to reduce these crossing problems, but they are non-trivial and may have other side-effects. Our universe is limited to 3 physical dimensions, which limits the number of aspects that can be "put next to each other" for speed. The virtual worlds that we keep our data in may have many dimensions. A typical sales DB will have store/customer location, warehouse/supplier location, time, product type, legal categories, total inventory, and marketing trends such as by age, gender, salary, religion, political affiliation, etc.

[Definition 2: It's forcing your users to wait whilst you produce requested reports instead of letting your users request reports, immediately letting them get back to editing CRUD screens or request more reports, and notifying them when the report output is ready.]

It's relatively trivial to make a web app launch a separate and independent "report window" so that the underlying screen/page is not disturbed and can be worked on further. Any "GUI kit" should make it easy to launch or run (optionally) independent windows or panels. (Maybe it uses HOF's under the hood, but the developer doesn't necessarily have to see that implementation detail.) Unnecessary blocking is usually a flaw/limitation of the specific client, NOT the default or mandatory behavior of non-HOF tools. You seem to be attacking a straw-man.

False. Unnecessary blocking isn't a limitation of specific clients; it's the default behaviour of all I/O not using callbacks. (Well, stuff like futures can prevent the initial query call from blocking, too. But they themselves will block if evaluated "too soon" and aren't really a substitute for evented callback I/O.) This is because, by necessity, non-callback I/O must return its results, which means the function must block until it 'has the result. Separate, mutually non-blocking windows are in most GUI toolkits handled using threads, which are still an important concurrency tool but don't actually solve the problem of unnecessary blocking. Suppose you made a database query in the same thread as runs the report window; now the report window can't refresh itself until the database returns, because the query has blocked the thread. If you make a non-blocking I/O call, the current thread is of course not blocked and continues normally. -DavidMcLean

You would NOT want the window or panel or button area to go off and do something else. That would be confusing to the user. It's the Brady Bunch with Segways on crack. A Cancel button is the only sane possible other activity I can think of. You must build funky GUI's, like Flash fashion games for tweens or something. There it does not matter if the user is confused; tweens like to be confused; it's good for giggles. "Hey looky, I accidentally gave Justin Beiber tits, he he he!"

I can only assume you're being deliberately obtuse at this stage. It's already been explained that modern custom business apps regularly use a concurrent sort of interface, in which users may request multiple reports and continue working with the software while they're waiting for the reports to be generated. This is obviously a significantly better interface than one which completely locks up while generating a single report, providing only a Cancel button (which is the design you appear to be recommending), and it's easy to implement using an asychronous evented system. -DavidMcLean

Your argument is weak for reasons already given. And I can even use an "iFrame" or "frameset" tag in HTML to make say 8 independent report "panels" each with a "run report X" button. No HOF's. That should be a GUI client feature.


        Example: Frame-mania

        -----------------

[Run Report A]


        -----------------

Report B is finished. [View]


        -----------------

Report C is running. [Cancel]


        -----------------

[Run Report D]


        -----------------

[Run Report E]


        -----------------

Report F is running. [Cancel]


        -----------------

[Run Report G]


        -----------------

Report H is finished. [View]

Yes, that is definitely a thing you can do. It's a really, really awful interface, but you can indeed do it. However, the fact that you're using frames to display results on the client-side doesn't preclude the use of evented I/O to speed up the 'server-side, and speeding up the server-side will improve the response times even of this interface. All I'm seeing from that response is that you hate higher-order functions so much you refuse to learn AJAX. -DavidMcLean

How exactly would it speed it up on the server side? You keep claiming otherwise, but my experience strongly tells me that the database is usually the bottleneck. If you are doing 10K loop cycles in your app software, then you or the DBA are probably doing something wrong. Your focus on in-app optimization is misplaced.

I've explained in depth at least twice that the bottleneck isn't tbe database, but the fact that your application code has to sit around and wait for database results while doing nothing. Perhaps that phrasing didn't appeal to you? What if I say it like this? Yes, the bottleneck is the database. However, if you use evented I/O, you 'don't let that bottleneck slow down your application. Your application doesn't block when it makes a database query, so it's free to perform other processing while waiting for the query to return; this processing is in the form of handling other events in the event queue, and events-with-callbacks represent any and all tasks your application needs to address, so handling those events amounts to getting work done. This means that whenever there is any work to be done (events in the queue), your application is active doing that work. It's never just "sitting around" waiting for a database query to return, unless there are no other events in the queue, which translates to there being no other work to do. This is a benefit nothing but non-blocking I/O can provide, threads included, because each individual thread in a normal threaded arrangement must still block on I/O calls. If you need any of those points explained in greater depth, scroll up. -DavidMcLean

You have a solution looking for a problem instead of the other way around. What ELSE useful and non-confusing would it be doing?

The problem is "we need to service many users concurrently", and the solution is "we don't waste time waiting for database results when we can spend it servicing other users". What you've just asked is "what else would the app do, besides what the app is intended to do?" Obviously nothing. That would be stupid. What the app 'does' do is process as many requests from users as it can, using all the processing time available to it; this allows it to handle many requests from many users easily and with good performance.

Perhaps there is some forest-level confusion. I'm assuming a client/server or "traditional" web model. Each user "session" or submission is generally independent. We don't have to micromanage that process partitioning/scope in app code. The web server does it. Are you making a web server?

Yes. It's client/server, and the server-side code is handled wholly in-application. You don't need to reimplement Web-servering from scratch, because the standard library comes with Web server support, but you do run a Web server in your app code. This usually doesn't actually differ much from code that simply runs on an existing Web server, especially if you're making use of libraries: There's a library called express which is regularly used for Web app development in Node, similar to the Sinatra library for Ruby, which handles routing, static-file serving, and other bits of Web servers you don't really want to implement yourself.

So you are advocating tossing out the "traditional" web-server for a roll-your-own-web-server kit/library? That's a far larger issue than HOF's. That's out of left field. No wonder you've been talking about all those things "it could be processing"; you're thinking above level of individual user or application. Holy Moly. -t

StandardToolDependancy

In a sense, yes. Traditional Web servers, like Apache, handle multiple requests by sparking off separate threads for each connection; this obviously works, or they wouldn't do it, but threads are rather heavyweight and many of them will block on I/O calls, leaving a bunch of threads sitting around being useless for a while. Single-threaded evented I/O systems like Node are in many cases 'better than the traditional thread-per-connection model. In practice, however, a large portion of traditional Web apps (such as many Python Web apps) are actually run separately from the direct Web server and served via reverse-proxy, and the same method works fine if the app happens to be running on Node. -DavidMcLean

Again, in most web-apps I work on, the bottleneck is the database. As far as your claimed benchmarks, that's something that's difficult to test here. I'm more interested in something that makes code easier to create and maintain. Performance tuning is usually most economically done by isolating the bottleneck and tuning that. Customers usually don't want to increase maintenance costs by shaving a few milliseconds in speed. If they ask for speed, then I will focus on such more, but at the expense of other features and/or cost to the customer. Sometimes it's the customers own fault. For example, they may want to stuff the public-facing web-site with all kinds of cutesy graphics and animation, which results in loading lots of fat JavaScript libraries to get eye-candy. Given of choice between snappy and eye-candy, such customers opt for the eye-candy. A fast server won't change the performance of that kind of mess.

[HOFs and utility libraries like Node.js unquestionably make code easier to create and maintain compared to not having HOFs and utility libraries like Node.js, when the requirements present the sort of problems that HOFs and Node.js solve. Isn't that obvious? These things don't exist arbitrarily; they exist to solve problems.]

Yes, PickTheRightToolForTheJob. HOF's are usually the wrong tool for CBA's. And sometimes fads do exist arbitrarily. Existence alone is not evidence of utility.

[On what basis? Are you sure you're not letting your personal revulsion toward HOFs bias your judgement? I know for a fact that not all custom business application developers feel that way. Indeed, most whom I've worked with over the years are seriously committed to advancing the state-of-the-art in custom business application development, evaluating the state-of-the-art, and applying the state-of-the-art where beneficial.]

Then ask THEM for scenarios, since you seem dry.

[I've given you the quintessential scenario: being able to generate multiple reports in the background -- or, indeed, perform any processing in the background -- whilst editing in the foreground and doing so within a seamless and fluid environment with full application control over how many reports (or other processes) are generated concurrently. For environments without evented I/O on both the client (typically via AJAX) and server side (often via Node.js), meeting all those requirements is essentially a showstopper.]

Why does that need HOF's, or even multi-threading? Multi-threading may jam up the DB from other users. Have a report queue table; the report processor can grab the next one from the queue and process it. We can have 1, 2, 3 etc. processing "lines" working on the queue, but have to keep in mind that the DB is the usual bottleneck and that others may be currently using it. Thus 2 processing lines may be the maximum realistic, and 1 is often good enough. But they don't need HOF's. If they are long-running reports, then perhaps use email to notify the user when they are ready. If short-term, then the client can poll the queue table for notification. Note that there are existing commercial reporting tools that handle such report queuing. (Like I said, for the more resource-intensive reports, a nightly or weekend process typically pre-digests the data and puts the semi-results in separate "summary tables" to make them runnable in real-time during the work day and also offload queries from the production tables. This is yet another way of not overloading the DB.) -t

It doesn't need multi-threading at all. Evented I/O isn't multi-threaded. We've been over this. As for your report-queue table idea, yes, that would work. Here's the thing, though: Your queue table is reinventing almost exactly what Node does, in a Greenspunny fashion. Why implement this queueing functionality in your app manually when an async I/O system can provide the same benefit and can apply to all I/O calls rather than just to specifically-designated requests? -DavidMcLean

They are harder to "x-ray" and track, as described below. Pointers are fuckers to debug and trace. And you can't share the queue with multiple languages. And not being multu-threaded makes them even more limiting. I can put two or more parallel pipelines up to process the queue table if needed, running on different servers and/or different languages/tools.

What kind of terrible debuggers do you have that show you raw pointers? Higher-order functions are no more difficult to debug than regular functions, since fundamentally they 'are regular functions; any decent debugger will handle them exactly the way it handles other functions. As for the lack of multi-threading, you're correct: Being unable to process the queue using multiple cores/processors of a system that has such things can weaken overall performance. Node addresses this flaw, however, using the cluster module provided in the standard library. With just a few lines of code (about four, fairly simple lines), cluster will spawn separate processes for each core/processor. The cluster module may also be employed by libraries; the kue module mentioned below, for example, will automatically spawn processes for each core, all of which will process tasks from the single database-stored task queue. -DavidMcLean

It's almost a whole operating system. I want to run a report, not install a clustering pseudo-OS. You are GoldPlating here. HOF's are more difficult to debug because they have no name. "Debugger, what function am I looking at?" Debugger: "Why it's function 4728038273 at address location 48267183607". "Why thank you, Debugger, that was very helpful! Now can you open the pod-bay doors for me?". "Sorry, I cannot do that, Mr. 4827203724 for reason number 4803482672."

[It's "almost a whole operating system"? Isn't that a bit of an exaggeration?]

[Have you debugged in a language with HOFs? Have you debugged HOFs? Your tale about debuggers is entirely fictional, so I can only assume you haven't. 'While' and 'for' loops don't have names, so do you find them difficult to debug, too?]

Loops fit in between an existing sequential sequence of statements.

So do function calls, including calls to higher-order functions.

Another advantage of using queue tables instead of HOF's is that we can mix and match languages and tools that generate reports. The more complicated your node-a-tron gets, the more you are encountering GreencoddsTenthRuleOfProgramming. People like to "x-ray the pipes" of all that activity, especially if things go wrong. We can use all the existing CRUD tools in our shop to make "report queue manager" screens etc. to track who's running it, whats running, when it started, etc. It's a pain the ass to do that with HOF's. -t

Please, stop telling me that the bottleneck is the database and therefore your app cannot benefit from evented I/O. It almost feels like you're not actually paying any attention to my (many) explanations. Feel free to scroll up and read over them again and to present any questions that may arise from doing so. -DavidMcLean

[Top frequently pays attention to explanations, eventually, but you'll likely have to repeat them many, many more times -- in a variety of ways -- before he takes much notice.]

Maybe try something different, perhaps say, maybe an um, hey, about a semi-realistic biz senario!!!

Okay, I'll provide another explanation, and this time I'll use an analogy. Top, you've characterised business apps as similar to a receptionist, directing people to correct parts of the building. This seems a good characterisation to me, so let's use it for our analogy.

Our receptionist will first represent "normal" I/O, serial and blocking. She's at the receptionists' desk, and an employee comes along. "Where can I find the foogarbles?" he asks, and she directs him accordingly. "Oh, they're in room 3131," she says. The employee wanders off toward room 3131. Now, the receptionist now just sits there. When another employee comes along to ask her something, she waves a hand and mumbles "I'm helping this other gentleman; please wait a moment." A queue of people slowly form, all needing the receptionist's help, but she's ignoring them all. In fact, she refuses to do any work until the first employee gets back, at which point she asks "Did you find the foogarbles alright?", to which he says yes. Then she gets to work on the next employee.

Now our receptionist will represent non-blocking evented I/O. The employee comes along and asks for foogarbles. She directs him, and then she goes right back to receptionisting. Another employee arrives, and she helps that employee out as well, perhaps telling him where the bazfangles are located. She just keeps working, even though she's waiting for that first employee to get back. When he does get back, she still asks "Did you find the foogarbles alright?", and he confirms that he did. However, there isn't a huge queue of employees to handle after that, because she's been helping those people while waiting for the first employee to find foogarbles.

Given this analogy, is it now clear to you what non-blocking I/O does and why it's useful to business applications? -DavidMcLean

Obviously it's a good trait in a receptionist, but whether it's good for common CBA, I'd have to study useful and common app scenarios. Generally it confuses the average user if a given application does more than one thing at a time. Further, a thread waiting does NOT necessarily hold up the CPU and other computer resources such that your analogy breaks down. The hardware can still work on other problems even if a given thread is waiting for something. Essentially each visitor coming to the reception desk can launch their own "receptionist" application, and generally they do want it to only "care" about them. -t

You've correctly extended the analogy to multiple threads: A receptionist is a thread, so adding another thread corresponds to adding an extra receptionist at the front desk. However, each additional receptionist has to be paid a salary, and if they're still using blocking I/O then you have a slew of receptionists that refuse to do any work for a large portion of the time, all of whom you're still paying to do pretty much nothing. Isn't it better for each receptionist to be non-blocking, such that they help the people who are 'there to be helped, rather than waiting for the people they've already directed to come back? -DavidMcLean

Software is not people. Let's not over-stretch the analogy; a garter belt may put somebody's eye out. If you can find actual realistic CBA scenarios that etc etc etc.

I don't believe it's an even slightly over-stretched analogy (and I have no clue what garter belts have to do with anything?). I'm open to criticism from anyone who understands evented I/O, however. I shouldn't need to narrow down to a 'specific custom-business-app scenario, because I've explained based on your characterisation of custom business apps in general, as you presented above. -DavidMcLean

I didn't mean the analogy to be all-encompassing. It was intended to illustrate a specific point about roles, not model all of CBA. And I do believe it's realistic to test any "good idea" in the real world (at least as close to it as we can on this wiki). I do believe in the benefit of road-testing (RaceTheDamnedCar) and want to avoid the "Greek Mistake" (EvidenceEras) of accepting that sitting in a corner and merely thinking about general solutions is sufficient. If you accept ancient-Greek-style evidence as-is, then we'll just have to agree to disagree in terms of what we accept as evidence. If it's clean logic in your head, then it should pass the road-test anyhow. I just feel that you are intent on selling refrigerators to this Eskimo. It's quite possible that the combo of existing techniques available to a typical CBA cover the vast majority of "needs" that you keep suggesting without explicit HOF's. If you cannot think of CBA's to illustrate your point, I don't know what to say. It's not my job to bring you your evidence. I would help if I could, but I just cannot find much use for those kinds of patterns so far. We dump some of the concurrency/parallelism issues off to the web server, some to the DB, some to client, some to scheduled services, etc. There are plenty of choices such that at least one is probably going to be generally sufficient. -t

I can't think of custom business applications to illustrate my point because I don't understand what you think counts as a custom business application. Apparently report-writers aren't, according to your prior statements. However, evented I/O isn't just a lab experiment: It's been applied in plenty of actual applications requiring good concurrent performance, and nearly every recent language has some affordance for the system. Heck, even CeeSharp has first-class support for asynchronous evented I/O, and that's about as mainstream-business as you can get in a language. Sure, there are plenty of other things that produce concurrent behaviour, but none of them achieve the same benefit as evented I/O: code that doesn't waste execution time waiting for results when it can do other stuff. -DavidMcLean

Where did I exclude report writers? I just said you are doing them wrong by letting users jam up the queue too easily, not that report writers were bad. And I don't know what kind of "waiting" you are referring to. I focus on the bottlenecks if there are speed/performance issues, and I don't see such as a magic fix. The bottlenecks are usually trace-able to the database. (Sorts and joins and non-indexable table scans are the usual suspects within queries and are inherently resource-intensive). I keep insisting this and then you guys come back and un-insist without giving realistic biz scenarios. You only came up with one realistic scenario, the "self-rolled join" (two query) scenario. But such occurs with noticeable effect maybe one in a hundred queries for the reasons given, but could be solved with the "parallel" construct I sketched up. But at least it's a start: your lumbering sick horse has at least left the gate. If you could come up with maybe 10 more scenarios that are too much for the "parallel" construct, then you'd be reaching about 10% of queries (assuming similar average frequency) and we may start getting somewhere. I've been doing CBA's for more than 2 decades; I know where the usual fricken bottlenecks are and you guys seem to be missing the target, over-focusing on the app processing for reasons that escape me. Maybe there's somehow a lot of hidden improvement in the NON-bottleneck areas, which comes across as very odd. Dark Matter? Dark Energy? Dark Time? If you could rewrite RDBMS somehow to speed up their sorts and joins and table-scans with your favorite doodads, THEN we're talking: you'd THEN be addressing the bottlenecks that I ACTUALLY encounter and detect in the field. If you "fix" something that the customer never noticed was broken and won't notice after the fix, then the customer won't buy your product. The database is where the customer (CBA devs) does notice slowness; so why not focus there? I notice the car tires wearing out often, yet you keep replacing the rear windshield, insisting it's the real problem. -t

[The response time of DBMSs is unlikely to change significantly, for basic computational complexity reasons. Therefore, we improve application fluidity and concurrency by using tools like Node.js that make it possible to negate the impact of DBMS bottlenecks. Without Node.js (or equivalent evented I/O support), the DBMS can be the bottleneck. Using Node.js, the DBMS is not the bottleneck. In simplest terms, without Node.js, the user must wait for her sales report to finish before she can edit an order for a customer waiting on the phone. If the customer insists on not being put on hold, our user can probably open another tab (or whatever) and edit the record, but that's awkward and implies an infrastructure that could allow users to launch an uncontrolled number of simultaneous report generations. With Node.js (or equivalent), reports can always be queued and never interrupt real-time editing, the application can present a seamless, fluid and well-engineered interface, and the application can easily control how many reports actually run concurrently.]

Re: "without Node.js, the user must wait for her sales report to finish before she can edit an order for a customer waiting on the phone" - You were doing something wrong if you were stuck with this problem before you discovered your magic HOF's, or using a crappy/limited UI tool, of which there's plenty. Or maybe you partition your applications on too large a scale. If you think HOF's are the only way to solve that, then perhaps you are in the wrong profession. I'm curious why you think I did it that way and that HOF's all the only real fix? -t

[Why do you think I was "doing something wrong"? I certainly don't think HOFs are the only way to solve the kind of problems that HOFs solve, but they're simple, easy to debug, and almost invariably superior to their alternatives, for reasons we've repeatedly identified. That's why they exist, and that's why so many developers think -- without any controversy; you're the only developer I've known to object to HOFs -- that they're a good thing. By the way, what does "I'm curious why you think I did it that way and that HOF's all the only real fix..." mean? Why do I think you did what "that way"? What's "that way"?]

[I am curious why you object so vehemently to HOFs that you're willing to suggest greater complexity in order to avoid them. Do you genuinely believe their alternatives are preferable? Or are the merits of HOFs irrelevant, and in fact you're so committed to preserving your place in this argument (or any argument) that you can't back down?]

You haven't showed "greater complexity" in anything resembling an objective and relevant measure. I'm not against all HOF usage, but I smell a fad and lots of bullshit over them of late, similar to the OOP-everywhere storm that rained down on the industry for about a decade. (OOP has its use, just not everywhere.) Your suggestion to CBA developers to roll their own web servers is an example: toss a working, well-tested, and (mostly) standardized tool that hides a lot of low-level details and instead micro-manage multiple users. There are rare niches for such, but without clear disclaimers that advice can be dangerous to the naive and re-complicate the industry. I shall strive to stamp out your bullshit. I'm a Smokey the Bear of IT bullshit. Realistic scenarios are the garlic and sunlight to fad vampires. (I know, Smokey fights fires, not vampires.)

[You showed "greater complexity" above, when you suggested manually implementing specific-purpose queuing in database tables rather than use the built-in general-purpose event queueing in Node.js and other event-driven I/O environments. Implementing a feature from scratch is inherently more complex than using an existing feature in an otherwise equivalent environment. If HOFs are "a fad and lots of bullshit", then I guess 'for' loops must have been "a fad and lots of bullshit" back when 'while' loops were good enough, right? Has there ever been a new feature in programming languages, or a language paradigm outside of procedural programming, that you didn't regard as "a fad and lots of bullshit"?]

The queue-ness came from a library function, not inherently from HOF's. There are other RAM-based queues. And I described weaknesses of your approach which you just plain ignored. Admit you've been beaten. And another weakness: power goes down and your HOF-based RAM lists are gone: POOF. But the table queue is still there, with the rockets' red glare. A RAM queue is often just a poor idea, HOF's or non-HOF.

[Why are you assuming the report queue is strictly RAM-based? It's trivial to have it fully controlled, managed, and processed by the application, have it seamlessly integrated into the user-interface, and persist it in a database table, all using Node.js or equivalent. Your apparent claimed "weakness" is that it's difficult to debug, despite the fact that it's been often pointed out that HOFs are no more difficult to debug than any other function. Therefore, I ignored it, as there's little point in making exactly the same points over and over again.]

No it's not. You just claim it's trivial. I can claim I'm Napoleon. I have the arrogance to prove it, no? And why have it in HOF's if it's in a database table? Why have the master queue be in two places? First you reinvent the web server, and you are now going to reinvent the RDBMS. Good luck with that. And you ignored the multiple language issue. If we use URI's and URL's in the queue table, then we can have it launch any tool or language we want to generate the reports. Didn't we have a similar discussion with the "scheduler" scenario? You guys got your cabooses kicked on that one also. He who doesn't learn from history is doomed to repeat it.

["Cabooses kicked"...??? I must have missed something there. I didn't even know there was a, uh, "caboose kicking" competition centred around a "scheduler" scenario. Last I heard, you were trying to dismiss scheduling as an example of an indivisible algorithm requiring customisation because you weren't familiar with it. Anyway, I'm not sure what you mean by "reinvent the RDBMS" or "have the master queue be in two places", as the server side of the application is undoubtedly already using the RDBMS. If we can use it to store customer data, we can use it to persist queued report requests, no? Of course, that queue representation is entirely orthogonal to using HOFs and/or evented I/O. Why do you seem to think it's HOFs vs RDBMSs? Furthermore, why the focus on "any tool or language"? My original requirement statement was very clear -- I don't recall mentioning any need for multi-language support. In fact, overall, your response is so confused that I can only assume you've long ago stopped trying to understand our responses, or even read them except to use them as a handy source of keywords to drive your rants. This has ceased to be a discussion; it's us patiently explaining, whilst you stick your fingers in your ears and shout ignorance and abuse.]

We discussed a process or application scheduler somewhere, similar to an OS scheduler. I'm not talking about employee scheduling. If the primary queue is in the DB, then where do the HOF's fit in? The value of multiple languages and tools is that we are not stuck with any one application language. The queue app can launch Java, C#, Python, Crystal Reports, etc. to generate reports. It should be clear that's advantageous to having to have all the reports be generated by a single language.

It absolutely is trivial, because it's already implemented in generic modules you can use. https://github.com/LearnBoost/kue is one example; it uses Redis for persistence rather than a traditional SQL RDBMS, but this gives the same debugging-inspection benefits, since Redis is as inspectable as an SQL DB. Also, the master queue wouldn't be in two places, because higher-order functions aren't a representation of the queue; higher-order functions with callbacks are used to represent the way to handle stuff that's in the queue. -DavidMcLean

So you are reinventing the database, the hard way. More standards, yeah! DatabasesAreMoreThanJustStorage. I don't think it's that I hate HOF's, it's that you hate databases.

I'm reinventing jack. The kue module is entirely stuff you'd need to code 'anyway, regardless of whether you're using SQL or something NoSQL (like Redis), and I don't need to invent Redis itself because it already exists and I can just install it, like I would any other RDBMS. -DavidMcLean

A shop doesn't want a proliferation of different DBMS unless there's a real need. You haven't shown any specific benefits, just brochure-talk.

Certainly. This is irrelevant to my point, however: that it's trivial to implement a database-stored task queue for a platform like Node, and one that makes good use of its evented capabilities.

"Good use" for what, job security?

"good use" as in clean and consistent use of evented callbacks, such that code using kue doesn't stand out or require different design techniques to most other Node code. It's actually most likely bad for job security, since it's consistent with language idioms. -DavidMcLean

You haven't shown a reason to use "other node code".

I'm assuming that if you're using a task-queue for Node.js, you'll also need to write at least some other Node.js code. To write no other code is to have an application that doesn't do anything.

Assume nothing. No Node.JS stuff in my shop until you show it improving something. Grumble grumble newbies.

Ah. I'd thought that by "You haven't shown a reason to use 'other node code'", you meant "other Node code besides the task queue you're already using", since that's what the word "other" means: stuff that isn't the stuff you're already talking about. It's fairly obvious why you'd need code besides the queue itself, since you need to implement the actual tasks to be queued and processed. If you're instead asking for yet another justification for using Node.js 'in general, I think you'll find that'd be off-topic. This particular conversation thread isn't about what Node improves; the rest of the page has been doing that. In this thread, we're talking about queueing tasks in a database. I said it was trivial to build a database-persisted task queue system for Node, then demonstrated exactly how one goes about doing that by linking to an actual implementation. My argument shows a way database-task-queuing can be done in Node easily and in a manner consistent with most other code written in the language (and therefore consistent with the rest of the application). -DavidMcLean

Your argument is growing tangled here. If we are using an RDBMS for most of our existing database-ish tasks, then why install another platform that ALSO does database-ish tasks? Then the maintainer has to know and support TWO database-ish platforms. That's a violation of OnceAndOnlyOnce. And if it covers the gamut of DatabaseVerbs, then it's not going to be trivial in the way you imply. If it doesn't cover the gamut, that means we'll likely have to re-invent those it doesn't cover if our needs expand. Why rely on a second half-ass database for portions when we are already using a full-ass database for our project? It doesn't sound logic to me. If you play the "simple" card, then you run into limits. If you play the "full" card, then your "it's easy and simple" claim is false because covering most DatabaseVerbs is not going to come from a trivial tool. And it's tool overlap under BOTH cards. -t

No, my argument was that it's trivial to implement a 'database-persisted task queue for Node.js. My argument has absolutely nothing to do with individual DBMSes, and I'm definitely not arguing that one should install two unrelated database systems for one application. I supported my argument with an actual example of a database-persisted task queue for Node.js to demonstrate that is indeed trivial; that example happens to be one that uses Redis, but this fact is irrelevant to the point. -DavidMcLean

You claim it's trivial to install and support. One would have to try to verify that claim. My solution doesn't require installing anything we are not already using for other parts of the app.

"You claim it's trivial to install and support." … no, I don't. I claim, as I've said several times, that it is trivial to implement a task queue; the only claim I've made about Redis is that it's as inspectable as an SQL database would be. Using kue doesn't require installing anything you're not using for other parts of the app, because you'd choose kue for an app where you've already chosen Redis as your DBMS. If you're using an SQL database, then use a task queue that persists to that SQL database instead, obviously. It's still trivial to implement a task queue for one of those, since SQL isn't 'that different from NoSQL. -DavidMcLean

This appears to contradict prior statements. Anyhow I'm not interested in this conversation anymore. It's not clear to me at all what the alleged benefits are, and the more questions I ask, the more convoluted the answers become. Readers can try it if they want.

It contradicts none of my prior statements. I've been saying, entirely consistently, that a database-persisted task queue may be implemented trivially for Node.js, using whatever database one wants. The fact that I presented a real implementation of this exact thing doesn't contradict or lessen that point; it supports it with evidence in the form of actual code.

Let somebody else try your reinvented database and reinvented web-servers. I'd get fired for tossing out road-tested and familiar tools like Apache and RDBMS.

No one is talking about a "reinvented database" here. SQL DBMSs are still used. Yes, the Web server may be changed, but that's when the web server causes an unacceptable bottleneck. That's why there are plenty of "somebody else's" who are using tools like Node.js to solve problems that conventional approaches cannot.

Since you say you don't know what kind of "waiting" I'm referring to, I'll explain the concept again, focusing on "waiting" and with code snippets. Here's the sort of code you might normally use for a database query:


        results = db.query("SOME QUERY HERE");

        print("DEBUG: ran query");

        buildReportWith(results);

Now, given that snippet… when will you see the result of the print call? When will that debug line get printed? If you're using regular blocking I/O, it won't print until the database query finishes and rows are returned to your application. Therefore, until rows are returned to your application, the app is essentially waiting at the first line of that snippet. Everything holds up until that query's done. The technical term for a call like this, a call your app has to wait for, is a "blocking" call. Now, let's try a similar snippet using non-blocking I/O:


        db.query("SOME QUERY HERE", function(results) {

        buildReportWith(results);

        });

        print("DEBUG: ran query");

When will the debug line print, this time? The answer is that it will print almost immediately. The code doesn't stop on the first line, waiting for the results; instead, it moves right onto the next line and prints the debug string. In this version, db.query is an example of a "non-blocking" call, since the code doesn't block to wait for it. The cleverness of Node.js and similar platforms is that nearly every I/O call is made non-blocking, so the application doesn't block while waiting on such things and can progress onward. -DavidMcLean

Okay, but why would we want that? It's not holding up any one computer anywhere, it's just a thread or process that is waiting. The computer(s) can still run gajillion other processes/threads. It's not a Commodore 64. The user interface for a given application, or even sub-application, is generally easier for users to "digest" if it's linear. That's just human nature and the designs fit that about humans. Again, there are a few over-caffeinated power-users who may want their screen to buzz like a Christmas tree in a wasp hive, but we can't cater the app to that 10%. Others must use it or they may get a different job.

Why we'd want that has been discussed plenty of times. Since we're apparently talking about interfaces, here's a reason we'd want it for interfaces: The user will find an interface employing evented concurrency much smoother and more responsive than one that is not. The user clicks a button that does a thing, and the button takes less time either to do the thing that it does directly (if it's a thing that doesn't take long to do), or to pop up some sort of progress indicator or change the button's colour or provide some other indication that clicking the button indeed has made it start doing the thing it does (if the thing takes a while to be done). Either of those options is significantly preferable to a button that seems completely unresponsive until the click can be dealt with, as would happen in a non-evented scenario. -DavidMcLean

I have no idea what you are talking about here. It sounds like a straw-man is being attacked. Please come up with a concrete scenario in a business setting with the user doing a realistic task. There are many different ways to do progress indicators, often governed by the design and weaknesses of the client. I agree there are broken and stupid GUI/UI/clients out there, but the fix is to fix them, not stuff HOF's around them.

[Do you really have no idea what your correspondent is talking about? Or do you not want to admit knowing exactly what he's talking about, because you've decided you're not going to accept HOFs, event-driven I/O, or any programming language feature or language paradigm that isn't one you already know and use?]

His claims about how non-HOF GUI's are always "getting stuck" or "waiting" is utter bullshit. My buttons don't wait on anything except the fucking database! Maybe he doesn't know how to use regular programming tools correctly. If the database or network latency is not the bottleneck, then the vast majority of times is because the programmer made a perfectly-fixable design flaw (or there are bugs/flaws in the client/GUI engine which other vendors do right). Please guys, describe a very explicit realistic biz scenario and be clear about what, where, when, why, and who is being pressed, ran, deleted, etc. Otherwise there is insufficient detail to know what the goddam hell each side is talking about. I cannot give my fix if I don't know what's broken/wrong/stuck, etc.

"My buttons don't wait on anything except the fucking database!" So what you're telling me is that your buttons 'do' wait, then? That's the waiting I'm talking about.

You haven't shown how HOF's speed up the database, although I expect you to sell the idea of a Hoffbase soon based on the patterns of tools you want to toss for your newfangled Hof-A-Matic or Hof-OS. And as a reminder in case you are dense or forgetful, more threads/processes trying to use the database won't help through-put, and may generate lots of complaints. A million threads waiting for the same database are not going to be faster than 10. Further/again 2, sometimes we want to intentionally make it NOT easy to launch a gijillion reports at the same time to discourage any one user from overloading the database server.

Evented I/O won't speed up the database (higher-order functions in general may be useful for implementing a database, though, not that you'd be particularly interested in database implementation). It speeds up your application by allowing it not to wait for the database to do its thing, and I've shown exactly how that works already. There's only one thread under the evented I/O model I'm assuming (which is the one Node uses), not a million or even ten; single-threaded evented models like Node's do make it easy to avoid overloading the database server by controlling how many queries activate concurrently, which is much harder to arrange under a totally-independent one-thread-per-activity model. -DavidMcLean

Again again again, what ELSE is it going to do? Why can't you explain this secret else-ness??? If I as a user press a button that says "Generate Report X", and I press that button, then I expect it to work on Report X AND ONLY Report X, NOT Report Y, NOT Honey Boo Boo videos. Doing those other things would make the button a liar. I don't want to build liar-ware. I'll leave that to obsessed fanboys. Now if your talking about applications that have panels that may be doing different things at the same time, most (decent) GUI kits have that capability built in. We don't have to re-implement it with HOF's (and possibly couldn't add it it with HOF's if it so lacked). So I don't know what horrid limit one is imagining above. I cannot read your mind. Explicitness in a fleshed out specific step-by-step detailed example would really really really really really really really really be helpful. -t

Buttons don't really do work, you know. They tell the application to do work. The "generate report X" button tells the app to generate report X; under a blocking model, that means the app will halt until report X is done being generated. Under a non-blocking model, the user is free to, say, decide they need report Y as well and click on its button. (Also, I have explained the "secret else-ness". Several times.)

My apps don't have this limit. You've been using crap-ware. Burn it! Solved! Goodnite!

Your apps are using some form of non-blocking model, then. Good for you.

All without (exposed) HOF's. Yes, good for me!

Since you're presumably already using a non-blocking evented model, you should be able to see the value in such models in general. There are of course multiple ways to implement evented models in programming languages. Using callback functions (preferably ones which are also closures) happens to be one of the most concise, flexible, and capable ways of doing so, however.

It's the GUI/client framework that uses the "non-blocking evented model", or perhaps it's the "moldy gerbil model" under the hood. The point is that the CBA dev doesn't have to fricken care. That's the grand power of abstraction and standardization, and I'm happy to use the grand power of abstraction and standardization to hide away the grunty details of parallelism micromanagement to keep the code simple and cross-trainable. Let the machine slave away to micromanage the grunty computer-oriented details so we can focus on business logic because as humans that's what most of us want and that's what management wants out of us, at least as we mature and stop wanting to roll our own OS's for pleasure's sake (at least at work). If there's a rare need where the organization needs to bring in a specialist for something specific, then so be it. You had your chance to demo something practical and common for CBA devs, and came up short. --top

It's not "under the hood". When you develop using a usual GUI framework, you have widgets, like buttons, and those buttons have events, like being clicked. You attach code to be run when those events trigger. That's the model; models aren't under-the-hood, because it's implementations of models that run under the hood.

And it could be using Eval to execute "event" code. I don't know and I don't have to know. It's not something a GUI system user should have to give a flying flip about. HOF's versus Eval versus gerbils on treadmills is a low-level implementation detail that does not need to be exposed to the CBA dev. That's the art and beauty and utility of good abstractions and standards. Hide the grunt nuts-and-bolts shit. The RDBMS may be doing parallelism under the hood to speed up my queries. Usually I don't know or care and don't have to know or care how it's implementing that parallelism under the hood. It may even do it different on Tuesday than it did on Wednesday due to the server load or an upgrade. The GUI tool/kit should be the same way: I describe what I want, not how to do it. It decides HOW to implement parallelism and perhaps different implementations are swappable.

You're going to have to care about the interface your GUI toolkit provides for attaching event handlers to widgets, because if you don't pay any attention to that you obviously can't attach any event handlers to widgets. Whether or not it uses higher-order functions to accept handlers is hardly a low-level implementation detail, because it dictates the interface the toolkit exposes for adding event handlers and therefore the interface that you as an app developer will have to work with while developing your app.

Can you produce a REALISTIC situation where it would have to be exposed? (You should have anticipated such a question by now and responded. You are a slow learner in these human communication matters.)

If by "it would have to be exposed" you're referring to higher-order functions being exposed… they don't have to be. Like I said, there are multiple ways to implement eventing systems, and higher-order functions are not mandatory. They are however a very clean and consistent way to implement eventing, and when closures are thrown in they enable the use of quite complex logic in an evented fashion (this being the whole idea of Node).

This appears to contradict your prior statement. Please clarify. I don't dispute that HOF's may be great for SystemsSoftware. I never disputed that so there is no need to make a case for that sector.

"implement" may have been a poor choice of word on my part there, and I apologise for that. So, to clarify: This discussion is currently about designing the interface of an evented system; there are several ways to do this, not all of which require higher-order functions. However, using callback functions, which 'does' require higher-order functions to exist, is a very good way to provide an evented interface with minimal syntactic overhead, such that it's a capable enough design to be used for writing complex logic in an evented fashion without too much confusion. Make more sense?

Are you talking about designing the client tool, such as a competitor to HtmlStack browsers or Visual Basic? That may indeed be an interesting discussion (and has existing topics), but is outside the scope of CBA, for that would be a SystemsSoftware issue. There are some key decisions to be made in such a tool about how events are specified and how it "connects" to a given app language, but the options/tradeoff-profiles are so wide open that it would be a very looooong topic. Note that in my opinion, such a tool shouldn't be tied to any one app or scripting language and thus shouldn't assume or heavily depend on any one feature of a given language. Eventing scope can largely be defined declaratively (attributes) such that we don't need the GUI system to depend on imperative setup info. The more that's defined declaratively, the less we have to worry-about/learn/connect-to specific app languages to use the GUI kit/engine. (In practice, it may need a default language to serve as a reference implementation.)

I am indeed talking about the design of things like the client-side tools. However, I'm not talking about it in terms of how those things are implemented; I'm merely talking about the interface they provide to application developers, which is absolutely something custom business developers must be concerned with.

In my opinion, a declarative interface similar to the "timer" example in HofPattern would be a nice way to make it app-language-neutrual. The actual implementation under the hood could use or create HOF's, use Eval for languages lacking HOF's, etc.

In many cases it could perhaps use the markup language's own features: