hckrnws
This made me scratch my head:
> Even in a language like Rust, which has not yet implemented mutable aliasing (an oft-requested feature stuck at the RFC stage)
Disallowing mutable aliasing is in fact the whole point of Rust. Perhaps it’s “requested” by people still learning the language? Is the “not yet implemented” part meant as irony? Does the author mean something else?
And it has mutable aliasing via `UnsafeCell`[0] which, as the name says, is unsafe. But safe wrappers can be built around it, such as `Cell`[1]. Or you can just use raw pointers and take safety into your own hands.
[0]: https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html
This also made me scratch my beard, but I think he maybe means allowing multiple mutable references to a variable when the compiler can prove that nothing bad happens. It's something a bunch of people have been trying to solve with different models, the most recent I know of is Tree Borrows [1].
Tree Borrows is a model of `unsafe` code (for example using raw pointers), it is not a proposal to change what the safe Rust borrow checker accepts.
Can someone explain the relationship between implementing WITH (…) AS X … and integer linear programs?
I wish this post had more examples of concrete non-tree queries. Is cross-joining a CTE to itself realistic?
Imo this discussion seems to assume multiple references of a CTE contain many of the same rows with different transforms which are later joined.. what’s an actual example of this? I don’t think I’ve ever seen it
Usually CTEs are shorthand for doing a few joins. And when you reference them multiple times, it’s almost always with different rows in the CTE table. AKA little-no reuse potential. Postgres WITH AS NOT MATERIALIZED of course
More deserves to be written on the ILP idea, I haven't actually tried to make it work but it seems to me like the only real direction to optimize ("optimize" used in the mathematical sense, rather than the programming sense) queries that you can't just exploit the principle of optimality on (see this earlier article I wrote [1] for a bit more exposition). I think maybe some of the egg [2] people have experimented with this.
Speaking from experience, there's lots of rewrites you'd want to do in a query optimizer that having access to efficient DAG-shaped query plans would make tenable, but as a specific example they are an important part of doing full subquery decorrelation [3].
[1] https://buttondown.email/jaffray/archive/why-are-query-plans...
[2] https://egraphs-good.github.io/
[3] https://www.scattered-thoughts.net/writing/materialize-decor...
Also just in general the idea of at least being worst case optimal (polynomial faster multi way join-project w.r.t. input rows, assuming fixed query (but those polynomials are given by concepts like the "fractional hypertree width" of a query) but pathologically connected/connecting input rows to the join):
the simplest example is a simple two-column table of a directed graph's edges, e(from, to). Then: Triangle(a, b, c) := e(a, b), e(b, c), e(c, a).
#Triangle <= c * (#e)^1.5, for a small c. (I think c = 3 in this case, but I'm not sure.)
Using binary joins for pathological input will however take O((#e)^2). If you do row-granularity lookup using an auxiliary index (actually 2 in the simplified way, but they can be merged by pushing the post-lookup choice down to index creation) along with indexing e twice (once for each of the two columns), you can do it in O((#e)^1.5):
Create auxiliary indices eaux1 and eaux2: eaux1(from, (#e(from, to))). eaux2(to, (#e(from, to))).
Iterate over e(a, b): Look up eaux1(b, ?countbc) and eaux2(a, ? countca). If countbc < countca, then: expand through point lookup e(b, ?c). Filter those resulting e(?a, b), e(b, ?c) candidates using a point membership query suitable index of e(c, a). else: expand through point lookup e(?c, a). Filter those resulting e(a, ?b), e(?c, a) candidates using a point membership query suitable index of e(b, c).
Done.
As can be seen, the trick is to select between the current options for "next table to join current partial row against" at each nesting level of the generally nested-loop-join structured query execution, using premade indices that reveal the iteration count of the upcoming inner nested loop in O(1) time.
> Is cross-joining a CTE to itself realistic?
Yes! Technically you have to say `WITH RECURSIVE ...` first, but yes, it's a very common thing to do to, for example, compute the transitive closure of nested groups. Just remember to use `UNION` not `UNION ALL`! (Otherwise you will get an infinite loop if there's a circular group.)
WITH RECURSIVE closure AS (
// Seed the closure
SELECT parent AS parent, child AS child
FROM groups
// Repeat the right side of the UNION
// until no new rows are added
UNION
// Recursively join the closure to the
// groups
SELECT g.parent, c.child
FROM groups g
JOIN closure c ON g.child = c.parent
);
I didn't know about datalog before reading this! I became kind of smitten with prolog in college but never found an opportunity to use it.
Now that I'm a gamedev, if there's a permissively licensed C/C++ or a DLL, or something in .net land that would let me store facts and rules and run some queries, I could put it to work.
If anyone that reads this has cooked with deductive databases / theorem proving / logic programming languages things I'd love to hear your guidepost opinions.
There's Souffle[1] that can synthesize C++ for you that you then compile with the rest of your C++.
CozoDB
CozoDB looks like a winner that checks all the boxes, thank you.
It's so nice to integrate, compared to most very well known behemoths. Kudos to the creator ( who I think is relatively young and clearly incredibly smart )
Huh yeah, that was very easy. I've already got CozoDB working in Unity3d, even though there was no apparent existing C# bindings. Cool.
What part of gameplay will you use it for?
I'm thinking that bare minimum it would be a good way to handle persistence. It's a simple way to do a KV store that's easy to write to a file. I think the same thing about sqlite.
The fact it comes with baked in graph traversal means I could offload pathfinding to it for certain kinds of projects (thinking like turn based strategy games on a hex grid, multiplayer board games). I'm a big fan of isomorphism in anything multiplayer.
There's some more esoteric things I want to think about, I just don't know datalog enough yet to know how much magic this will bring to these ideas.
I need to learn enough to answer questions like "how hard is it to solve a game of clue (the boardgame, with the rooms and the murder by candlestick) with a query in cozo?" -- If it's no sweat, like I think it might be, then that opens up a lot of possibilities for deductive AI systems. Constraint based solving / deduction is what I'm really interested in. Once I get a sense of what I can do, the ideas will come.
State machines, graphs and trees are data structures I use _constantly_. I want to love cozo enough to promote it to handling those almost every time I use one that might benefit from persistence or have any chance of enjoying emergent mechanics from being able to create cross system relations.
I saw the api had dijkstra's but I didn't see A* which seemed interesting. Lots to learn.
For clarity, in (computer science) graph theory, a tree is a rooted, strict, connected, undirected, acyclic graph.
Explanation:
rooted: One vertex (also called a node) has been designated the root. The depth of a vertex is the number of branches in the path from root to the vertex. So depth of the root itself is zero and is the only vertex in the tree with this property.
strict: The edges have no weight(s).
connected: A graph is said to be connected if there is a path (edge) between every pair of vertices. From every node to any other node, there is a path to traverse.
undirected: The edges have no directionality specified.
acyclic: The are no closed loops (no cycles of any length).
graph: An abstract data structure consisting of a finite set of vertices and a finite set of edges (vertices pairs).
As another commenter has pointed out, the definition of a tree in mathematics is somewhat different...a tree need not have a root.
Out of curiosity does that definition come from somewhere specific? I would argue that strictness and undirectedness are not prerequisites for a data structure to be called a tree.
For example, B-trees are directed trees, and Huffman Coding is based on a weighted tree structure.
I gave my good faith understanding of what a tree is, but your comment made me think more about it, and to look it up.
From chapter two of the book "Algorithmic Graph Theory and Sage" [1]; it states the following concerning directedness:
"A directed tree is a digraph which would be a tree if the directions on the edges were ignored." ... which is the correct definition from my understanding.
However, the authors go on to say:
"A rooted tree can be regarded as a directed tree since...Directed trees are pervasive in theoretical computer science, as they are useful structures for describing algorithms and relationships between objects in certain datasets."
So, colloquially (but perhaps not formally), one can refer to a directed, rooted, connected, acyclic graph as a tree.
Concerning weights, wolfram provides a definition of a strict graph [2] which specifies the unweighted qualifier (and undirected qualifier):
"A simple graph, also called a strict graph (Tutte 1998, p. 2), is an unweighted, undirected graph..."
The wolfram definition of a tree [3] includes the 'simple' qualifier:
Tree "is a simple, undirected, connected, acyclic graph..."
So, I would suggest formally, a tree cannot have weights (recall that weights are basically labels assigned to the edges).
Incidentally, I took down from the shelf my copy of The Art of Computer Programming (TAOCP) Volume One, and section 2.3 deals with trees exclusively. Donal Knuth asserts that trees are 'the most important nonlinear structures that arise in computer algorithms'. I skimmed the entire section and although he makes reference to weighted path lengths, I did not see any mention of weighted trees.
So, to my understanding, the original definition I proposed is valid, but I am very open to correction.
Concerning spanning trees and Huffman Coding - the tree are constructed from weighted graphs (not that the trees are weighted themselves). However, I understand where you are coming from, where again colloquially (but perhaps not formally), people often refer to and construct weighted trees.
At the beginning of Section 2.3.4 in TAOCP, Knuth makes the following important point concerning graph theory:
"Unfortunately, there will probably never be a standard terminology in this field, and so the author has followed the usual practice of comtemporary books on graph theory, namely to use words that are similar but not identical to the terms used in any other books on graph theory...the nomenclature used here is also biased towards computer applications".
So, in summary, my definition of a tree is open to debate :)
I am grateful for your question because in seeking out the answer, it has deepened my understanding of what a tree is.
[1] Algorithmic Graph Theory and Sage" by David Joyner, Minh Van Nguyen, David Phillip; Version 0.8-r1991;
A tree is a type of graph. Is the headline meant to reveal the fact that not all graphs are trees (which is obvious) or is it meant like "not all graphs are trees, and here's how you deal with those non-tree graphs?" Even then I'm confused about the title of the article.
For anyone getting caught up in the notation, note that the actual example expressions are immaterial; you might as well replace the strange looking relational algebra operators with "="s or "+"s. At it's core the article is covering different ways to represent graphs, and demonstrating that one particular method allows concise representation of graphs that aren't easily represented by linear expressions (linear in the graph sense, not the linear algebra sense).
In order, the representations are:
1. "Inlining", a direct tree structure, in C++-like syntax something like `struct Node {vector<Node>;}`. The insight here is that if you're not storing a tree, some of the nodes need to be duplicated.
2. "Let Bindings", An indirect tree structure, like `struct Node {vector<Node*>;}`. Here, the insight is that instead of storing nodes, you store names of nodes (in this case pointers), so that the nodes can be re-used. Note that the nodes themselves may be stored somewhere else in the same expressions.
3. "Self-Reference", same as above, but a node can store the name of itself, or one of its ancestors.
4. "Fixpoint Operator", Instead of explicitly storing a graph, we store a transformation from one graph to another, then say the graph we want equals that transformation repeatedly applied to some starting graph, until the transformation's output equals its input (a.k.a. a fixed point is found). Something like `struct ImplicitGraph {Graph initial; Graph (*)(Graph) transform;}`. If you're unfamiliar with C++ function pointers, transform is a function from Graph to Graph. The "hole" the author talks about refers to the fact that the output graph of the transformation can contain copies of the input graph, so the "hole" is the transform parameter; there's a "hole" in the output graph until the parameter is known. Since the transformation can be arbitrarily complex, any graph can be represented, although typically, you'd want as simple a transform as possible.
5. "Explicit Listing of Edges", this method is actually more straightforward than the previous, but it requires thinking about the graph a little differently. It's exactly what it says on the tin; instead of the graph being implicit in the structure of the data, the edges (a.k.a. links or lines) are explicitly listed. There's more than one way to do this, for example a straightforward way might look like `struct Graph {set<Node>; set<pair<Node*, Node*>>;}`. From the example in the article, it looks like Datalog is closer to something like `struct Graph {set<Node> initial_nodes; set<optional<Node> (*)(Node)> generators;}`. Kind of like in 4., you start with a set of nodes, then repeatedly apply each generator to each node until you stop getting new nodes. The edges in this case are between each node and the nodes it generates.
Finally, all of these types of representations also exist syntactically, not just structurally. After all, an expression in a given syntax is just a way of representing some structure, the same way a bunch of bytes in memory can be a way of representing some structure. And each of these representations has strengths syntactically and structurally.
When I hear 'tree' I think https://en.wikipedia.org/wiki/Directed_acyclic_graph
Tree's are subsets of DAGs — A DAG where each node has a single parent except for a specific root element which has none.
> where each node has a single parent except for a specific root element which has none.
Slight nitpick: in mathematics, a tree need not have a root.
https://mathworld.wolfram.com/FreeTree.html: “a normal tree with no node singled out for special treatment”
https://mathworld.wolfram.com/RootedTree.html: “A rooted tree is a tree in which a special ("labeled") node is singled out. This node is called the "root" or (less commonly) "eve" of the tree. Rooted trees are equivalent to oriented trees (Knuth 1997, pp. 385-399). A tree which is not rooted is sometimes called a free tree, although the unqualified term "tree" generally refers to a free tree.”
Huh, TIL - though
> A tree which is not rooted is sometimes called a free tree, although the unqualified term "tree" generally refers to a free tree.
I wonder if that was once the case but no longer is. I'm learning I think of trees mostly through the lens of data structures and not graph theory and I imagine more people do than not.
Technically a tree doesn’t need to be directed.
A tree is (also) any loop-free graph.
A forest is an acyclic graph. A tree is a connected forest.
Exactly one path between any two nodes is how I best visualize it.
A forest is where all nodes are connected by at most one path. 0 or 1. And a single tree is a forest. And a single node is a tree.
You gotta be careful -- a path does not exist between two leaf nodes, unless your edges are non-directional, in which case you're right. Oh and restrict it to 'only visit each node once' given undirected.
Sorry my inner edge case fairy beckoned.
Visiting a node twice implies a loop, which implies two paths between your original nodes.
Let’s say we want to get from A to B, but we visit Y twice. That means there’s a segment of A~B that is Y~Y (a loop). However, since we’re talking about undirected edges, we can reverse that segment. Then both our original A~B and the A~B with Y~Y reversed are paths from A to B.
So if there’s a single path from A to B, there cannot be a Y we visit twice along that path.
(Please don’t take this as criticism; my pedantic nature got the better of me.)
Yes, but in an UndirectedGraph you can do loops easy peasy. A->b->a->b etc. if you have a tree with a parent node, then the tree (A{children:[b], parent: null}, B{children:[], parent: A}) still may allow for this behavior (graph representation unions parent and children since a parent in tree parlence is basically 'incoming-edges' in graph parlence while children is 'outgoing-edges')
I was trying to find terminology for something Like 'all possible paths between these two nodes contain exactly one Hamiltonian sub-path' but I think that's a bit of a circular definition
All that said I'm running off 3 hours of sleep and about to recoup so that probably explains my non constructive comments here
any loop-free graph which is connected (a 2 unlinked dot graph would be loop-free, but wouldn't be a tree)
(to complement cperciva's answer with a counter example)
I think that I shall never see; A graph more lovely than a tree. A tree whose crucial property; Is loop-free connectivity. - Radia Perlman, inventor of spanning tree
The article talks about them on the progressive journey to generalized graphs.
1960: Arrays (Lisp)
2001: Arrays & Trees (Json)
2042: Graphs, Trees, Arrays (XML final era)
One of the main problems in programming is, that a self-referential structure may be cyclic and a cyclic structure combined with eager evaluation leads to a infinitely looping program.
As Haskell is lazy, this self-referencing is quite nice. There is this reddit-joke about implementing an is-even function. In haskell we can write an infinite list of alternating True/False like `alt = True : False : alt` and is-even is now `is_even n = alt !! n` where !! is array access.
Even if the data structure is not cyclic, but you're talking about sharing, then you have to have syntax about explicit sharing and the lifetime of the share (immutability comes to the rescue again), as detecting sharing is a very difficult problem in and of itself.
Did anyone catch why `Let` type in the second code listing had to have both `expr` and `value` properties?
let x = value
in expr(x)
#NotAllGraphs
[flagged]
[flagged]
I get this feeling about myself when I see this impenetrable notation with no introduction.
it's relational algebra. think sql but replace all the keywords with symbols
> What one fool can do, another can. (Ancient Simian Proverb.) —SPT
Not all graphs are trees, but all the graphs in this article are trees.
They are DAGs, which are a deduplicated representation of trees.
A DAG is not a tree. It literally has graph in the name. In a tree each node has exactly one parent / incoming edge. Not the case for DAGs.
In the area of programming languages and compilers, we usually have trees without parent links, with substructure sharing, which are DAGs. We call them trees because most of the code operating on them thinks they are.
E.g. the Lisp expression (+ a (* b b)), informally called a syntax tree, is actually a DAG. Both apparent instances of the b symbol are actually graph edges leading to the same symbol object. Moreover, nothing points to its parent; that would be horrible.
Other kinds of trees don't always have parent pointers. For instance search trees like red-black can be implemented with parent pointers, or without; e.g. with functional recursion.
> E.g. the Lisp expression (+ a (* b b)), informally called a syntax tree, is actually a DAG
That is a syntax tree. Syntactically it is a tree. The fact that you have a later step that recognises that the `b` identifiers refer to the same object and turn it into a DAG doesn't mean it isn't a tree initially.
Most of the time when the word "tree" is used in this context, it is in reference to that DAG, not the printed version.
E.g. see the tree and tree-structure glossary entries in ANSI CL.
https://www.lispworks.com/documentation/lw61/CLHS/Body/26_gl...
Guess the parent means that it's relatively straight forward to converted a DAG to a tree by duplicating nodes (or rather, sub-graphs) having multiple parents.
Yeah I think so but that doesn't mean a DAG is a tree.
Crafted by Rajat
Source Code