File Tree Meets Relational

Based on conversation snippet from RelationalTreesAndGraphsDiscussionTwo


Everything is convertible to everything else with enough effort and perhaps enough copying. But this is obviously far from the ideal. If we have 30 User Defined Structure Types and they lack relational operators, then building something for each could require up to 30 x R operators where R is the number of relational operators. Thus, if we assume 15 relational operators, we have up to 450 operators to implement. And this says nothing about efficiency, concurrency, etc, as our million-element stack report example illustrates. -t

If nothing has been said about efficiency and concurrency, then there isn't much reason to believe they suffer. And why would I implement 450 relational operators when I could write simple functions to construct relations from values then use the resulting relations? That way I'd need at most 30 such functions, likely fewer with GenericProgramming and composition.

A 'real world example? I want to perform relational operators on my existing file system'. I know of no easy way without mass periodic copying or slow iterative loops. Maybe there is a way to hack with the OS to maintain indexes automatically or the like, but this is busting encapsulation, making implementation ADT swapping problematic. We'd have to stick with *just* POSIX-like or FTP-like commands if we want to stick to the "purer" ADT model, and thus no internal hacks. -t

I imagine that a FileSystem (an entity that mutates in response to commands) differs in many critical ways from DomainValues, especially including immutability and the ability to receive commands. As noted in PROOF THREE and PROOF FOUR, above, DomainValues are never truly encapsulated. Since DomainValues are not encapsulated, then something about your argument is incorrect: either FileSystem DomainValues need to be immutable constructs (like versions or snapshots) that can be fully observed and processed without any use of 'commands', or the FileSystem mustn't be a DomainValue.

No, it does not significantly differ from any other database. Your oddly-defined DomainValue double-speak will not save you.

A 'database' in its common usage has some sort of extrinsic identity and state, which means it is not a value and is therefore not a DomainValue. Also, I don't really need "saving" when it's your arguments that are self-destructing based on equivocation fallacies.

We have attributes and values and we want to search, sort, join etc. Show your grand proofs in action. Typical file system attributes include:

What is it you think I need to show? FileSystem is not a DomainValue. Neither is a Database. It is neither easier nor more difficult to add an index to a FileSystem than it is to add one to any other RDBMS.

Joining across DB brands, or even instances of the same brand, can indeed be a pain largely because of the separation of query and implementation. The typical query interface does not provide enough integration to draw up some form of efficiency, resulting in working copies and sequential processing. It's a similar problem to the "too much encapsulation" issue raised in the parent topic. Maybe solving it for one will solve it for another.


Merging RDBMS technology with FileSystems is not new. There have been a number of projects that either use an RDBMS (or similar machinery) to house what would otherwise be "file" data in a more or less structured and query-able fashion, and/or to maintain query-able meta-data. Some have only been academic experiments, others -- like Pick -- have achieved considerable commercial success. I currently have a student producing a comprehensive survey of these, along with some experimental practical work of his own. I'll encourage him to make it available on the Web, with a link here, when it's done.

See http://en.wikipedia.org/wiki/Pick_operating_system


Despite that I believe the example contrived (no FileSystem, not even a versioned one, would be represented with each FileSystem state as a DomainValue), it can still serve as a demonstrative example of how indexing is achieved. Just to be clear, though, I'm not endorsing representing FileSystems as DomainValues. By no means is a FileSystem a measure, assignment, etc.

Anyhow, I'll be back.

Hell no, I'm not voting for Arnold again.

This explanation is split into three parts: Structure - an explanation of what the FileSystem looks like externally and internally. This is explained as a reference. It is not the only possibility, but is rather one selected to be complex enough for realism but simple enough that explaining it doesn't cause me any headaches. Indexing - an explanation of how the FileSystem is indexed, how those indexes are declared and maintained. Once again, this is but one design, though it is a decent one. Comparison - a comparison between this design and what TopMind believes to be superior.

Structure:

The filesystem tree structure used for this simulation is:

MD = (create:Time modify:Time) ;; non-recursive
FS = dir(meta:MD content:{map string=>FS})|file(meta:MD content:String) ;; recursive, hierarchical
A 'map' should just be considered shorthand for a relation (key:T1 val:T2) with the first as a candidate key.

This type separates the directories from files, making them distinct types that do not overlap - a feature common to many FileSystems (and one that I've never liked about them) but all the better for simulation. 'MD' represents metadata about each file, and could be extended as necessary. Each directory contains a relation of string (filename) to another 'FS' (that is, a file or another directory). Files themselves do not know their names; the name for a file is a property of the directory. Files are not "linkable" in this FileSystem because there is no indirection between a file and its content. Hard-linking could be added via an indirection (file->inode->content), and soft-linking via an extra type (symlink->path), but I'd rather not deal with them at the moment since they'll confuse the issue of sharing.

At this point the filesystem looks something like this:

. . dir(meta:(create:0 modify:N)) . .
. . . |etc. . . |usr. . |home . . . .
. dir(...). . dir(...). dir(meta:(create:3 modify:N))
. . . . . . . . . . . . |you. . |me .
. . . . . . . . . . . dir(...). dir(...)
. . . . . . . . . . ./fA. \fB . /fC .\fD
. . . . . . . ."StringA" "StringBC" "StringD"

Note that some sharing may occur under-the-hood that isn't exposed to users; for example, your 'fB' and my 'fC' files may share the same contents ("StringBC" in the above diagram) even though they have different access permissions, create+modify metadata, and so on. This is value-sharing, not linking, which means if you modify your 'fB', it is imperative that my 'fC' remains the same. Value sharing is a logical copy, and is traditionally achieved by CopyOnWrite for shared structures (which the FileSystem can distinguish via reference-counts or via marking a bit to indicate that a structure has been shared at least once then performing GarbageCollection). I emphasize this because the distinction between forms of sharing has repeatedly been a point of confusion for TopMind (who thinks 'normalizing' is about value sharing, according to his own comments in RelationalTreesAndGraphsDiscussionTwo, and who doesn't seem to grok that shared values have different mutate characteristics than shared objects). The FileSystem is allowed to share structure under-the-hood to save space and indexing, and users won't ever be aware of this sharing except to potentially have an understanding about the performance issues in space and time when copying large values (like MP3s, videos, etc.).

Value-sharing of directory structures is also possible, though is hindered greatly if maintaining the 'create' and 'modify' times. That is, if you copied the whole '/usr' directory into your '/home/you' directory, it could have been a completely logical copy except the directory and file "meta" contents, which need to be updated so they have the most up-to-date create+modify time. The fact that even a simple 'copy' operation results in a huge meta-data 'mutate' is among the reasons that FileSystems aren't DomainValues. OTOH, if working with a versioned FileSystem, one where each update results in a new 'FileSystem value' and one can look back at many past versions of the FileSystem, then a great deal of this directory structure can easily be shared across versions of the FileSystem (and, indeed, that is how (some) versioned FileSystems are implemented; others use snapshots).

Having more value-sharing helps enforce and clarify the distinction between trees-as-DomainValues vs. trees-as-mutable-objects or hierarchical-tree-structured-data. So, for the sake of introducing as much value-sharing as possible, I'll go ahead and treat the above as a versioned FileSystem. Thus the DataBase of versioned, simulated FileSystem looks something like this:

TABLE fs_versions
ver . operation . . . . . . . . . . . . fs_value(create,modify)[contents] . . . . . . . . . . . . . . . . . . . . . .
---------------------------------------------------------------------------------------------------------------------
  1. . . format. . . . . . . . . . . . . . (0,0)[] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  2. . . mkdir /etc. . . . . . . . . . . . (0,1)[etc=>(1,1)[]] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  3. . . mkdir /usr. . . . . . . . . . . . (0,2)[etc=>(1,1)[] usr=>(2,2)[]]. . . . . . . . . . . . . . . . . . . . . . .
  4. . . mkdir /home . . . . . . . . . . . (0,3)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,3)[]]. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  1. . . mkdir /home/you . . . . . . . . . (0,4)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,4)[you=>(4,4)[]]]. . . . . . . . . . . . . . . . . . . . . . .
  1. . . mknod /home/you/fA. . . . . . . . (0,5)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,5)[you=>(4,5)[fA=>(5,5)""]]] . . . . . . . . . . . . . . . . .
  1. . . write /home/you/fA "StringA". . . (0,6)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,6)[you=>(4,6)[fA=>(5,6)"StringA"]]]. . . . . . . . . . . . . .
  1. . . mknod /home/you/fB. . . . . . . . (0,7)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,7)[you=>(4,7)[fA=>(5,6)"StringA" fB=>(7,7)""]]]. . . . . . . .
  1. . . write /home/you/fB "StringBC" . . (0,8)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,8)[you=>(4,8)[fA=>(5,6)"StringA" fB=>(7,8)"StringBC"]]]. . . .
  1. . . mkdir /home/me. . . . . . . . . . (0,9)[etc=>(1,1)[] usr=>(2,2)[] . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . home=>(3,9)[you=>(4,8)[fA=>(5,6)"StringA" fB=>(7,8)"StringBC"]. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . me=>(9,9)[]]] . . . . . . . . . . . . . . . . . . . . . . .
  1. . copy /home/you/fB /home/me/fC . (0,10)[etc=>(1,1)[] usr=>(2,2)[]. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .home=>(3,10)[you=>(4,8)[fA=>(5,6)"StringA" fB=>(7,8)"StringBC"]. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . me=>(9,10)[fC=>(10,10)"StringBC"]]] . . . . . . . . . . .
  1. . mknod /home/me/fD . . . . . . . . (0,11)[etc=>(1,1)[] usr=>(2,2)[]. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .home=>(3,11)[you=>(4,8)[fA=>(5,6)"StringA" fB=>(7,8)"StringBC"]. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . me=>(9,11)[fC=>(10,10)"StringBC" fd=>(11,11)""]]] . . . .
  1. . write /home/me/fD "StringD" . . . (0,12)[etc=>(1,1)[] usr=>(2,2)[]. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .home=>(3,12)[you=>(4,8)[fA=>(5,6)"StringA" fB=>(7,8)"StringBC"]. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . me=>(9,12)[fC=>(10,10)"StringBC" fd=>(11,12)"StringD"]]].

Disk drives have finite space, of course, so we must occasionally delete older versions of the FileSystem and collect the space freed by doing so. Typical versioned FileSystems, as seen with would simply collect versions based on a policy... e.g. keep at least an hourly version for the last month, a daily version for the last year, and a weekly version before then, though it often needs some variation for very-large-files (e.g. if performing video-editing, gigabytes of space is often required, and keeping those around is too expensive) and files that can be regenerated (like '.o' files from compilation). In any case, for versioned FS one benefits greatly from both inter-value sharing (as between versions of the FileSystem value) and intra-value sharing (as for "StringBC") which save space. Examples of versioned FileSystems include ZFS and Mac OS X's Time Machine (among others: http://en.wikipedia.org/wiki/Versioning_file_system).

After some versions are selected for deletion, they are removed from the DataBase, and a GarbageCollection must occur. E.g. if I deleted versions 5 and 6 above, then the physical space associated with nodes (0,5), (3,5), (4,5), (5,5), (0,6), (3,6), (4,6) would be available for reuse. Note that nodes (5,6), (1,1), and (2,2) are still shared by other versions and couldn't be collected. This need for GarbageCollection - recovery of the physical space in which a value or parts of a value are represented while having some concern for sharing - is a natural extension of the fact that structured DomainValues are generally shared as values rather than fully copied.

The mechanisms to achieving value sharing under-the-hood are already discussed, and include interning of values, CopyOnWrite, and disfavoring use of 'parent' pointers. Parent pointers force each parent to have a full copy of all children (though an approach specifically for versioned systems, especially useful for versioned graphs, is to use multi-version pointers where a given pointer points to a range of versions - I won't be using this here since it isn't generic to value-sharing). Anyhow, detailed discussion of this sharing is in RelationalTreesAndGraphsDiscussionTwo. So, since that's a solved problem, I'm going to assume we are all in agreement that, for example, every reference to "StringA" is potentially a reference to the same, physical copy of "StringA", and every reference to 'etc=>(1,1)[]' uses the same node (1,1)[] and potentially even the same internal representation of 'etc' (though the benefit there is marginal for small file-names). Use of interning of values would allow "StringBC" to be identical even in the case it wasn't produced by copy. So, despite the amount of copying that is apparent in the above representation, actual layout on disk could be quite compact. The physical layout of versions 7 and 8 would look something like as follows:

. . . . . (0,7) . . . . . (0,8) . . . . . .
. . .[home usr etc] . [etc usr home]. . . .
. . . . | . . \ . \ . / . / . . | . . . . .
. . . (3,7) . .\. (1,1) ./. . (3,8) . . . .
. . . [you] . . \ .[ ]. / . . [you] . . . .
. . . . | . . . .\. . ./. . . . | . . . . .
. . . (4,7) . . . (2,2) . . . (4,8) . . . .
. . .[fB fA]. . . .[ ]. . . .[fA fB]. . . .
. . . | . .\______. . .______/. .|. . . . .
. . (7,7) . . . . \ . / . . . .(7,8). . . .
. . ."" . . . . . (5,6) . . ."StringBC" . .
. . . . . . . . "StringA" . . . . . . . . .

This underlying structure and value-sharing isn't exposed to the users, but it does become important for indexing performance and a few other performance issues, and so it will come up later.

Indexing:

We have named as desiderata for quick searches, joins, queries such attributes as filename and modification and create time as well as some derived attributes including path, file-size, and file-content (e.g. lexical searches for files containing particular words). In a versioned FileSystem, these could be requested for a particular version or for a range of versions, and perhaps some new queries might be interesting (such as DeltaIsolation + differences between versions), but optimizing for those is beyond the current agenda.

Some relevant questions are:

I'm aiming here for a RealTime indexing solution (maintained for each update, maintenance cost proportional to update size) that is declarative, does not violate encapsulation of representation (i.e. users don't have access to pointers or tables under-the-hood), does not interfere with GarbageCollection, is constructed lazily, doesn't force a modification of the queries themselves, and where the solution can be applied generically to more than just versioned FileSystems. This is a tough combination of characteristics, but frankly I'm not interested in any solution that doesn't achieve them (though I'm willing to flex on the utilization-in-queries a bit). Beyond the constraints above, I also wish to guarantee that indexing has no side-effects, that all operations and computations that go into achieving the indexes will terminate, and that all indexing operations are well defined mathematically (i.e. type-safe indexing).

Indexing - Expression Of: The sub-questions here are, (a) precisely how do I express a particular index for filename, filesize, path, file-content, file-create and modify time, etc. (b) precisely how do I tell the RDBMS to maintain this index? And, of course, it isn't even that trivial: (a.prelude) how do I express the concepts of "filename", "filesize", "path", "file-content", etc? After all, before I can index over something, I must first define that 'something', and the above values are full 'filesystems'; it isn't as though "path" or even "filename" is an attribute in the RDBMS.

Before answering, I'll fall back just a little bit to explain what an index is. An index is, in essence, a search performed ahead of time so it doesn't need to be performed at query-time. A search, in turn, is simply one class of computation, and indexing is one form of preprocessing. Other forms of preprocessing include pre-caching (downloading parts of a page in anticipation of their use, or loading pages from HDD in advance), pre-instantiation (FlyweightPattern), table lookups for functions (memoization, http://en.wikipedia.org/wiki/Memoization, http://en.wikipedia.org/wiki/Lookup_table), advance compilation (CompileTime is preprocess for runtime, as opposed to JustInTimeCompilation). There are more examples, of course, but the reason I bring this up is the InventorsParadox. It turns out it is simpler to solve, and implement the solution for, a more general problem than just indexing.

We already have an established ways to tell an RDBMS about a computation in advance of its use: views, and user-defined functions (UDFs). Usefully, UDFs can be abstracted as views.

So, to answer to all three of the above questions: I will, in essence, describe concepts and what needs to be indexed as views, then I'll actually create the index by telling the RDBMS to maintain these views in advance of my requiring them. I'll be taking significant advantage of the ability for UDFs to recursively construct relations as part of defining views. For example, for purpose of querying for paths and files, the ability to associate all deep nodes back to their originating root 'fs_value' is useful

Anyhow, that's all I was able to write up this Sunday. I'll be back.

In the meantime, I leave TopMind with an exercise: if he really thinks he can get away without copies in the 'parent_id' schema (as discussed in RelationalTreesAndGraphsDiscussionTwo near (page anchor: node sharing example)), he should try to do so with the versioned FileSystem. I.e. represent step 3, then perform the operation to move to step 4 without copying the 'etc=>(1,1)[]' node, without destructively modifying the DomainValue used for the version in step 3.

I like the idea to not declare/create an index but instead let the system figure out the indexing needs based on usage pattern - which are embodied in the views. If this is what have in mind. I think views are generally undervalued. But I don't know why. Is it because they are not efficient in practice? Or is it because - like all things in the database - they are typically involved in a more elaborate change management process? I'd use views more if they'd be less cumbersome to introduce, maintain and use from most programming languages outside of the DB core. I think that views could and should even be generalized (polymorphism) such that they can be instantiated on different structures (tables) as needed. Having multiple 'views' of the same underlying data kept consistent is something you cannot emulate with any programming language I know of (except of course if you implement your own view package). -- GunnarZarncke Note: Figuring out the indexing from usage is also the topic of AdaptiveCollection.

While I too like the idea of automatic discovery of or adaption to usage patterns, I do not wish to appeal in this discussion to 'sufficiently smart' anything. This also means excluding searches and automated discovery of query optimizations, and favoring algorithmic approaches.

I on the other hand like the clear separation of concerns. Don't you think that using side effects of views for the creation of indexes is a bit to difficult to follow? ("I know I will have usage pattern X, so I have to create view XX to get this. Here I see view Y. What kind of access structure is implied by it. Is it a genuine view or just an access optimization?")

Views are a way of expressing in an RDBMS a named computation. The name allows the view to be leveraged by other queries. A view becomes an index when you tell the RDBMS to also prepare for rapid access to view data in advance of its use. If this meta-data about which views are 'prepared views' is available to those examining the SQL schema, it is unlikely to be a point of confusion. Meeting the goals above, deleting the index would not break queries, at least so long as you do not also delete the named computation - the 'view'. Because expressing a computation in advance of its requirement is not a concern separable from explicit indexing, this division between expressing the view vs. expressing+indexing the view allows a maximal separation of concerns that is possible with explicit indexing.

Admittedly, explicit creation of indexes, even if declarative, isn't quite so convenient as having the DBMS just guess or infer what it is you'll need in the future based on either the queries handed to it in the past, or perhaps based on actual abstract queries handed to it in advance of use. I wouldn't deny the DBMS the ability to implement such features; it is more that I don't wish to bring them into this discussion. Besides, when the goal is RealTime performance, the ability to tell the DBMS exactly what it needs to maintain would be critical, so good support for implicit indexing is only a substitute most of the time.

SeeAlso CouchDb seems to follow this IndexFollowsView pattern.


Re "under-the-hood". The original requirements called for 'swappable' implementations. In other words, any executable or service that satisfies requirements (ADT) should have the indexability characteristics. Of course if one can control the implementation, it can be integrated with the RDBMS or given RDBMS-like efficiency. I never disputed this.

Where did the "original requirements" call for such a thing?

I have been talking about user-defined DomainValue types (UDTs) including user-defined structure types (like trees, graphs, lists of trees of relations, and so on). The RDBMS is free to represent these types "under-the-hood" however it wishes, since representation is encapsulated (PROOF ONE and TWO in RelationalTreesAndGraphsDiscussionTwo). It only needs to provide for the operations over them. I have not wavered from this position. Ever. It has been the same in CrossToolTypeAndObjectSharing, DoesRelationalRequireTypes, RelationalTreesAndGraphsDiscussion, and RelationalTreesAndGraphsDiscussionTwo. I even went to special effort to make it abundantly clear that I'm not talking about 'encapsulation' since DomainValues are never encapsulated (PROOF THREE and PROOF FOUR on the same page).

In the meantime, you've been making allegations against 'structure' types, insinuating they fail at node and index sharing, their performance will sucks rotten apples, and so on.

Are you saying you haven't been disputing me this whole time? Well... you sure fooled me. Shame on me, I guess. *eyes roll*

But swappability means you cannot control the implementation. If you are talking about a special kind of "leaky ADT", that's another animal. True, it is possible to define a reference-only ADT such that only pointers/ID's to nodes are stored "in" the ADT instead of the nodes themselves. But such is kind of a de-fanged ADT. If there are additional requirements or restrictions your "type" assumes in order to satisfy the RDBMS-like requirements, please state them. I envisioned something along the lines of "efficient querying of any file-system that satisfies the POSIX requirements". -t

RE: If there are additional requirements or restrictions your "type" assumes in order to satisfy the RDBMS-like requirements, please state them. -- Sure. They need to be "value" types.

That's the only requirement there has ever been. But it just so happens that things that can be described by 'value' types must have intrinsic identity and intrinsically complete representation. These properties were described to you at the top of the DomainValue page, and have been described elsewhere. It seems you have an appalling deficiency in your education and you don't understand what a 'value' is. So I'll explain:

In general, values are mathematical constructs. They are immutable and forever, like Platonic forms, and they live in a 'language' which gives them representation. Your idea about 'encapsulation' or 'not controlling implementation' isn't particularly relevant. Representations of values can be encapsulated by services, modules, etc. thus preventing them from being copied or treated as values. But, when discussing with CrossToolTypeAndObjectSharing and DoesRelationalRequireTypes and MagicEverythingMachine, we have not been discussing 'encapsulated' values. We've been talking about values shared by MessagePassing, queries, and so on.

Is this news to anyone other than TopMind? ... Don't all speak up at once, now.

And, TopMind, a question for you: can you please re-establish and clarify your position with regards to the use of structure DomainValue UDTs?


I also want to perform relational operators on my existing file system. And above that I want to perform file system operations on by database. And I want to surf my filesystem and database with a browser. Possible it it. Everything is convertible to everything else. At least on a sufficiently abstract level.

Just imagine being able to