## Tuesday, July 26, 2005

### hardness of the URE/SURE scheduling

The input to a SURE scheduling problem is a (n+d)*m matrix, and we are to find the minimum running time of the system (with a reasonable aggrement on this statement).

n has to be >= 1.
• If n=1, then we have a URE. The scheduling problem for n = 1 is as hard as the LP problem of d*m size.
• When n>1, the problem is harder.

OTOH, d has to be >= 0
• If d = 0. we have a digraph and the scheduling problem is a DFS!
• if d = 1, we have a digraph, whose weights are scalars. This means that the scheduling problem is as hard as a APSP.
• for d > 1, the problem becomes harder
• ## Sunday, July 24, 2005

### A data structural approach to detect cycles

Refer to the previous post about the construction of the hypergraph with ordinary nodes and hypernodes.

The ordinary nodes correspond to the number of times an edge has been selected in the PRDG. A hyperedge is from a subset of (ordinary) nodes to a subset of nodes. Either the head of the edge, or the tail of the edge can be a zero size subset.

Now define the arity of a hyperedge as the sum of its indegrees and outdegrees.
Some simple observations:
for a hypernode v with an edge S to T, if indegree(v) = outdegree(v) = 1 (or card(S)=card(T)=1), then we can "club" the two singleton sets to make another singletonset. The hyperedges which begin or end at the vertices of the sets S and T can be made to correspondingly begin or end at the new vertex, which can be called its own singleton set.

The hypernodes can be stored in a priority queue, which supports the following operations:
• deletion of the hypernode with the minimal arity

Also, a hypernode v such that indegree(v) = 0 or (outdegree(v) =0) can be removed from the queue. This would lead us to a pruning of the graph:
removal of the hypernode from the priority queue: This hypernode may not be at the top of the priority queue.
pruning the hyperedges which begin or end at these vertices
--
TO DO: try to see some examples of the particular linear programs of unitary form.
solve the dual linear programs
look at the original and dual graphs
• ## Thursday, July 21, 2005

### transitive orientations and separating hyperplanes

Given the unitarized block matrix of size (n+d) rows and m columns, we can draw a hypergraph G with (n+d) nodes and m edges in the following way:
(How can you use Euler's theorem in these proofs?)
The PRDG has no self loops. They can be eliminated in a simple way.
---------------------------------------------------------
====================
Alg1: Algorithm to construct G
====================
For every row of the block matrix, there is a node in the hypergaph /*There are total n+d nodes in G*/
for all columns of A, do the following:
If there is a +1 in a column of G, then we make them the tail of a hyperedge. If there is a -1 in the row, we make that corresponding node a head of a hyperedge (A 0 does nothing. By a restriction in the input PRDG, a 0 in the incidence matrix of input PRDG corresponds to nothing).
---------------------------------------------------------
We can construct another graph G' in the following way:
---------------------------------------------------------
====================
Algorithm to construct hypergraph G'
====================
Notation:
The input PRDG is unitarized by step A_1 of the algorithm (given above). So in the rest of the paper, we refer to the unitarized-PRDG G_u simply as a PRDG. We also refer to it as a graph when the context is clear (In such instances, we refer to the other graph as a hypergraph). We also refer to the PRDG in terms of its block matrix. The block matrix is a (n+d)*m matrix, with the columns corresponding to its edges. Also, the top n rows corresponding to the incidence matrix of the PRDG, which is but a directed graph. The bottom d rows of the block matrix contain the d-dimensional dependence vectors associated with that corresponding edge.

G': A hypergraph constructed from the PRDG. Specifically, it has
• a node for each column of the block matrix A (or, a node for each edge of the PRDG)

• a hyperedge for a row of the matrix A. This also means a hyperedge which for a vertex of the PRDG and a hyperedge for each component of the dependence vector. The tail of the hyperedge is all the rows of A that have an entry equal to -1. The head of the hyperedge is all the rows of A that have an entry of +1.

• one hypernode per hyperedge of the hypergraph G': These are the additional nodes that comeup in the analysis. Refer to section ... for additional notes on types of hypernodes.
---------------------------------------------------------

Lemma 1: G'=COMP(UNDIRECT(G))G' is the graph obtained by the complementing the underlying undirected graph for G(see proof below).

Lemma 2: In graph G, all the hypernodes are either sources (all outgoing nodes) or sinks (all incoming nodes).
Proof (by construction):
Corollary: in graph G', for v \in hypernode, indegree(v) = outdegree(v)

Theorem: The given system of equations has a solution iff the orientation of G is a transitive orientation.
Proof(if): The orientation of G is a transitive orientation

Corollary: The given system of equations has a zero cycle iff the orientation of G' is a transitive orientation.
--
What is a separating hyperplane? How to do decomposition?
--
the given PRDG has atleast one alternating sign in every row and column.
--
Dependence analysis of unitarized PRDGs
• ## Wednesday, July 20, 2005

### separating hyperplane for an n-dimensional unitary polytope

From sci.math on 13-JUL

Why don't we just say "polyhedra" instead of saying "convex polyhedra"? Isn't every polyhedron convex by definition(an intersection of affine
halfspaces)?

A poster replied

Well, it's really a terminology issue. Some people don't consider polyhedra convex. So for example this person's site: http://www.korthalsaltes.com/ has lots of non-convex polyhedra (eg: great icosahedron).
But yeah, in discrete geometry the definition necessitates convexity.

--
Frame the question of ZCD in terms of the polyhedron. Frame a version of Farkas lemma for the type of polyheda. What does the unitary polyhedron mean geometrically. What does its dual mean?
--
Draw the vectors in the (n+d) dimensional space.
--
Approach by paths on hypergraphs.
--
Given a colletion of hyperplanes, all of which pass through the origin, find if there exists a second point which lies on all the planes. Applying a variant of Farkas lemma(version II in ziegler's book), this question is equivalent to finding if there exists a row vector c in m-dimensional dual space with cA >= 0 and c < 0.
--
What does it mean by saying that there exists a vector c which has a positive dist with every column of the {0,-1,+1} matrix A?
get the farkas lemma for separability.

## Monday, July 18, 2005

### directed hypergraphs and MDS scheduling

Assume we have unitarized the PRDG. Also, there exists no self-loops (A zero means no edge. That's it). We obtain a (n+d)*m matrix. What if we interpret this as a directed hypergraph?
--
We are looking for a cycle, the edge weights of which sums to zero. So does this map to the presence of a cycle in the new model?
A simple example with 2 vertices and anti-parallel edges of weights (1,-1) and (-1,1) shows a cycle in the new graph. What does this mean?
Are we asking for an APSP in hypergraphs?
We are looking at a type of graphs called directed hypergraphs. Call them DHG. How is a DHG represented? Show figures of the representation.
Found the tutorial paper.
--
Does this mapping from PRDG to DHG map to detection of cycles in the new graph?
Define a path and cycle. B-arc, F-arc.
We can do a DFS and assign to each vertex a DFS number.
--
To do: make examples and check them, especially the ones similar to KMW.
What does MDS mean? coloring?
--
Call as fictitious nodes, nodes that correspond to the bottom part of the block matrix. Cycles which involve the fictitious nodes donot seem to matter. Classify this statement.
Statement of the zero cycle detection: begin at an "actual" node and do a DFS. If we reach a node back, then we get to a cycle. Is this right?

--
To do: read how does a DFS detects cycles: types of edges: tree edges, back edges, cross edges and forward edges.
--
did some part of DV-KMW-EX.
A "ZC" should touch both the original vertices and also the new vertices.
Partition the nodes into actual nodes and dist-nodes (call them D-nodes). A ZC in the PRDG should touch both nodes and D-nodes.
--
Group the nodes into the G-nodes and D-nodes. If there exists a cycle which has at least one node from G-set and atleast one node from D-set, then we have a zero-cycle in the original PRDG. Mark a hyperedge e (u_1...u_i -> v_1...v_j) as dashed if u_k (k \in 1..i) \in G-set and v_l (l \in 1...j) \in D-set. Or otherwise.

Look up the definition of Hypergraph from here. Also, get the definition of dual of a hypergraph, among other definitions.
This is a restatement of the Farkas lemma.
--
Modular Memory allocation: fibonacci example.

### semidefinite programming and MDS approaches

There are many ways in which we can approach to formulate the ZC-D or MDS using SDP. Here are the approaches for ZCD.
Method1: Begin with an approach similar to the LP formulation. define a diagonal matrix X such that diag(X) = x, where x is the x of CM-Z (or q of DV-KMW) To recap, x in CM-Z determined the number of times an edge is taken in a ZC. Also define (n+d) diagonal matrices A_k such that diag(A_k) = Kth_row(A), where A is the same as A of CM-Z (or B of DV-KMW). Just to recap, A in CM-Z had as the first n rows, the incidence matrix of the graph and has as the next n rows, the d-dimensional weight vectors associated with the edges.
The resultant "SDP" is the following:
-------------------------------
min X
s.t
A_k @ X = 0
forall k = 1...d
where A_k and X are all diagonal matrices.
//A and X belong to M_(m)
-------------------------------
Method2:
min X
s.t A_k @ X >>= 0
forall k = 1...d
where A_k is the adjacency matrix of the kth component of the dependence components.
This method raises many questions:
A_k is not a symmetric matrix. Can we make it a symmetric matrix, using a discrete Lyapunov kind of equality?
What does that mean for the matrix X??

--
To Do:Look up the spectral graph theory book to find how weights on the edges are represented.
What is the meaning of the eigen values of a weighed directed graph? How do you interpret it with respect to any of the known graph algorithms?

## Friday, July 15, 2005

### more semidefinite programming

The zero cycle detection from a graph can be done by a single LP program. Same as the one in CM-Z. So it can trivially be made a SDP.
----------------
The original LP is
min 1*x
s.t A x = 0
0 != x >= 0
---------------
The "SDP" would be
min I*X //I is a unit matrix of size n*n. X is a diagonal matrix of size n*n
s.t A_i . X >>= 0 for all i = 1..nrows(A) // A_i is a diagonal matrix such that diag(A_i) = i th row of A. Also, >>=0 is the relation for SEMI DEFINITE
0 != X >= 0
----------------
Since this conversion is trivial, what can we do something more clever?
How can we schedule the points using SDP?
to do: read lovasz''s notes. Vazirani's chapter. chapter from complexity-approximation. See if SDP is covered in Schrijver's new book.
--
--
Zero-cycle and eigen values/vectors: Zero cycle equation is Ax=0. Compare this with the eigen value equation Ax=(lambda)x. This means that, for a graph with zero-cycles, the eigen values are 0. Is this right? Also it could mean that all the eigen vectors are the zero vectors. What coule be a better classification?
--
Approach1:
What happens if we actually take the ZC equation and solve it for eigen values?
Two problems: (1) The block matrices are not necessarily a square matrix. Suppose, for now that we add additional edges/nodes so that we make it a square it.
(2) The block matrix is certainly not symmetric. This is not a big problem, as it will just lead to complex eigen vectors.

Approach2:
The usual method
From the block matrix, we can make the following observations.
The top part of the block matrix just represents a path in the directed graph. The bottom part represents the weighed sum of the components of the dependence vectors (i.e., the weighted sum of the first component of all the dependence vector, weighted sum of the second component of all the dependence vectors etc).

Top part.
--
to do What does it mean by the trace of a directed graph? Given that we need a square matrix, should we look at the adjacency representation, rather than incidence matix representation?
The important question may be, what does the zero cycle path mean in terms of the eigen values/eigen vectors?
--
Think of a directed graph represented as an adjacency matrix and each non-zero element annotated by the weight of that particular edge (the weight could be a d-dimensional vector). Now, formulate the problem of zero cycle detection in this matrix.
We have to find a n*n matrix X which has non-zero entries when that particular edge is "touched" in a zero-cycle. Further, the non-zero entry is equal to the number of times that edge is touched. Given a matrix X like this, it is easy to detect the zero-cycle (find the directed MST?).
--
Both the self-loops and multiple-edges can be removed by a simple method of addition of fictitious nodes and edges. We are left with a simple graph (not multi-graph) which has all 0's in the adjacency matrix. Let us call such a matrix A.
Let us call as X, a n*n matrix, which will have a non-zero element in the element (i,j) iff the edge between vertices i and j is a part of a zero-cycle. Are we minimizing the program? How do we schedule it?
Now is the matrix A semi-definite? What does the following mean X >>= 0?
define the definiteness of a directed and undirected graphs.
--
Define a directed graph, whose edges are annotated with scalar weight edges. What can we say about the SDness of the adjacency matrix?
--
The spectrahedra of an undirected graph has been well studied. What about the spectrahedra of a directed graph and what about the eigen values of the graph.
--
Since we are starting with matrices whose diagonal elements are zero, what can we say about the eigen values of each matrix? There are no self-loops => the diagonal elements of the matrix are zeroes => the coeff of the (n-1) term = a11+a22+a33+... = 0. Also, ...
--
Look up the relationship between unitary PRDG's and Hadamard matrices . Look it up in Stanley's book and Knuth.
--
min X
s.t A_1 @ X >>= 0
A_2 @ X >>= 0
A_3 @ X >>= 0
...
A_d @ X >>= 0
X >>= 0
--
Now, A_i can be thought of as the adjacency matrix of the PRDG, which is a directed graph, annotated by the weights of the ith component. It has no self-loops. This means that its diagonal elements are zeroes. So the trace of the matrix A_i is zero. It is called a trace-free matrix(see the link trace link). This is because of the property of the characteristic polynomial induced by the matrix A_i. A property of trace states that tr(A@B) = tr(A^T B)
--
the link on Characteristic polynomial is nice.
Question many things seem to depend on the positive definiteness of the matrix A_i. How can we prove that it is positive definite? If it is not +veSD, can we apply a transformation to th PRDG so that the resultant graph is +veSD?
Question is What can you say about the \lambda_min of a directed graph? What can you say about the definiteness of a non-symmetric matrix? In particular, a matrix whose trace is zero?
--

Most of the eigenvalue optimization theory has been developed for real, symmetric matrices. It is known that such matrices have real eigenvalues. Unsymmetric matrices, on the other hand, have complex eigenvalues in general. It is possible, however, to translate the constraint on the real part of the eigenvalues of a real unsymmetric matrix (say A) to be negative, into a positive definiteness condition on a real symmetric matrix (P) through Lyapunov’s matrix equality (2). Since it is a sufficient and necessary condition for a real symmetric matrix to be positive definite, its eigenvalues to be positive, the condition on the eigenvalues of the "difficult" unsymmetric matrix A is translated into another condition on the eigenvalues of the "not-so-difficult" symmetric matrix P. In order to avoid the potential non-smoothness arising in eigenvalue optimization, interior-point / logarithmic-barrier-transformation techniques have been successfully applied (Ringertz, 1997). For a comprehensive reference of interior-point optimization, see Fiacco and McCormick (1990). Making use of logarithmic and matrix determinant properties, it will be shown that the potentially non-smooth constraints on the eigenvalues of matrix P may be expressed in terms of the determinant of matrix P, which is a smooth function of the optimization variables.

The Lyapunov's matrix equality happens to be the following
Matrices A, P and Q are related through Lyapunov’s matrix equality. Read Lovasz's notes on the same. What does it mean for a directed graph? How can you "make" a undirected graph which is equivalent to it?
According to the Horn's Topics in Matrix analysis,
GA + A*G = H is the Lyapunov;s equation. (A*: means skew-hermintion not transpose: check)
It says that A is positive stable iff there exists a +veSD G of size n*n and H is +ve definite.
Hurwitz matrix
+ve cycles and stability of systems are equavalent!!!!!

## Thursday, July 14, 2005

### unitarization: a tiling approach

If the dependences in the original PRDG are uniform, the dependences in the tile graph are unitary (from the range {-1,0,+1}). Further, the dependence components are all from the range {0,+1}. Also, the tile graph is a URE, as all nodes are of the same "type". (Check if these statements are true for all tilings.)

Some assumptions are (1) There are a large number of tiles, in any dimension (if there are non-zero number of tiles in that direction, that is) and (2) Each tile has large number of iteration points. What about the boundary points?

So, this means that we have two PRDGs: The (1) URE of the tile graph and the (2) SURE of the iteration points inside a tile. Note that each of these are *much* smaller than the original PRDG. However, we will assume that their number is still large so as to consider the mode. This means that if we begin with a 2-d square iteration space with 1000*1000 points and divide it into 10 tiles of 100 points each. We still have to deal with a 10*10 points in the tile space, that is 100 tiles. Also, a tile has 100*100 points and so has 10000 points. Each of these is a considerable number.

We need to take care about the boundaries of the tile.

All this might lead to a ratio of the following. time taken by a free schedule of the tile graph and time taken by a free schedule of the original PRDG.

We also might add some models. That could make the problem harder.
--
Now we have an agreemnent on the following model:
(1) Define a URE with edges corresponding to the dependences between different tiles. It also has some parameters (which define the number of iteration points inside a tile).
(2) Define a SURE which is parametrrized by some variables defined in problem (1). Now this SURE could be non-infinite.

Question is: what can you say about the computability/scheduling of the combined system?
--
to do: read mails around march end
--
memory optimization: pareto optimal for memory and schedule. there is no "better" point, because of the multi-criterion optimization. We are talking about the solution space, not the feasible space. The set of interesting solutions form a curve. North-east, south-west points..
--
objections to using SDP for one-shot scheduling. the multi-dimensional schedule operates in a way that the output of the solution at one point is given as input to the next LP program(is this as simple as some edges disappearing?) If we want to use a single program, possibly that is an SDP, what do we minimize?
Right now, with LP formulation of the MDS, there is no way that we can compare two schedules (what does this mean?) with different number of dimensions. We can only compare the schedules of same # of dimensions.
So, we frame the scheduling problem as a polynomial schedule.
--
When is a polynomial representable as a SDP? The set seems to be called semi-algebraic sets.

## Tuesday, July 12, 2005

### semidefinite programming for multi-dimensional scheduling

The mono-dimensional scheduling problem can be solved by a single linear programming problem. The multi-dimensional scheduling problem, however, requires multiple calls to the LP solver. The minimum number of LP calls is in the paper DRV-KMW. Can ZC be detected by using a SDP solver? Can MDS be done by a SDP solver? The latter is a harder problem as it also has to return the certificates (the rows of the schedule).

Fact: An edge can be on only path of the KMW-T. If it lies in a prime node, then there exists a ZC.

The scheduling matrix, can be constructed in other ways, like using KMW method (one LP call per edge) or KS method (n*logn LP calls).

To Do: Weak separation oracle from Lovasz's lecture notes (and also GLS:Grotschel-Lovasz-Schrijver book?). How to add quadratic constraints still preserving semidefiniteness.
--
SDP duality holds only under some regularity conditions. What are they?
--
To Do: SDP in mails
--
A Linear program is trivially a SDP. If a constraint of a LPP is of the form <= p, we can define a diagonal matrix A with A_i,i = a_i and we can also define a variable matrix X_i,i = x_i The resultant program is a SDP.
--
We could do a similar thing with the vectors q or X. What about the vector (1,-1) for the edges? The 2nd eigen values are non-+ve. which means that we cannot directly use that value. However, SDP only requires that the variable matrices are SD. not their coefficients.
--
Is this a SDP?
max sum_e z_e
z^2 <= 1
--
To do look up the different SDP formulations from Lovasz's notes.

## Monday, July 11, 2005

### Separating hyperplanes, witnesses for unitary graphs

A separating vector is a vector which belongs to the dependence cone.
A witness is a special form of separating vectors: One that induces only +ve cycles in the given graph.
--
To Do: Read DRV or DV-KMW and draw the dual polyhedron for the examples from KMW.
To do: DRV has a PRDG in chapter 5. work on it.
--
The hyperplane/cycle algorithm doesnot seem to work when all the weights are (0,0,0). Is this a case that can be handled easily?
--
Example from DRV-EX-213: The input graph is already in unitarized form. DRV say that all the vectors that are orthogonal to (1,0,0) vanish. Our algorithm identifies the plane i=0 (the one orthogonal to (1,0,0)) to be, what CM-Z would call, a separating hyperplane . The edges from S2 to the bottom node and a two more edges disappear.
A few questions: What about the edge from S1 to the bottom node? Why does it not disappear? What about the vectors (0,0,0)?
--
Prove/disprove the following (important) : The unariness of the vectors (q,v).
Soln(of vector (q,v)): The vector q need not be unitary. A graph with three edges having the vectors e1=(1,0), e2=(-1,+), e3=(-1,-1) would have the solution 2q1 = q2 + q3. A circulation would be (q1,q2,q3) = (2,-1,-1), which is not unitary.
--
Possible choices for the Level sets: (1) The norm, (2) square of the norm (3) number of non-zero components in the dependence vectors
One more advantage of unitarization: The level sets of the vectors are from a finite set. If the dependences are not unitarized, the level sets are from a possibly infinite set.
--
Here may be the tentative algorithm:
(1) Unitarize the graph
(2) Find the set of edges with the largest level sets. This can be done by
(a) finding the largest level set
(b) selecting the edges with norm == that particular level set
(3) Amon the edges with the greatest level set, find the plane which is parallel to the maximal number of edges. This can be done by ........
--
The Unitarization Problem: Unitarization for ZC-D, multi-dim scheduling, MDS followed by optimal memory allocation, simultaneous MDS and OMA, tiling for parallelism, tiling for locality.
--
To do: How does the dual polyhedron look like? What is special about it when it has been unitarized? How do its level sets look like?

## Saturday, July 09, 2005

### separating hyperplanes, certificates, Ziegler

Main idea of CM-Z: Computing a certificate (a weakly separating hyperplane) and then computing the partition of the graph (by an All Pair Shortest Paths).
Example from CM-Z (page 811): (The only other example, the one on page 793, seems too simple, though is a unitary PRDG to begin with) Worked on CM-Z-EX-811. Obtained the same decomposition as given in the paper. If we unitarize, the unitarized graph also seems to have the same decomposition tree (did not work on it completely). The maximization along the diagonals seems to be working, for the decomposition. Obtained the correct separating vector.
What are the definitions and intuitiuons of separating vector, witness vector and g(lambda) from the example?
How is the vector lambda different from the scheduling matrix? Since lambda is a separating (column) vector, it is used to decompose the graph. The kth row of the scheduling matrix (where k is the depth at which lambda is found), may be a simple function of lambda.

Since CM-Z are talking only about a vector lambda, is it that they donot do a multi-dim scheduling? The main problem that CM-Z handle is detection of ZC. In a later section, they mention how their algorithm can be applied to scheduling and say that MDS is a simple extention. The depth of the decomposition tree in the example they give is 2, which means that it (cannot have a linear schedule) and has to have a multi-dim schedule.
--
To do: To Find definitions of ORTH, CIRC, lambda, partition?
To do: The problem that CM-Z solve is problem 4.7. read problem 4.7
To do: Read fast algorithms for All Pair Shortest Path Problem and methods of solving the problem for unitary graphs.
--
To do: Is the unitarization along diagonals right? POSTSCRIPT: WHAT DOES THIS MEAN??
--
What kind of graph, unitary or normal, will we be giving to the APSP solver. If it is unitary, can we do better?
--
From Ziegler's book on Polytopes: convex polytopes and non-convex polytopes are equivalent mathematically, not algorithmically. What does this mean? Also, this will be explained in chapter 1 of the book.
--
a d-dimensional crosspolytope == simplicial polytope
--
There are a few variants of Farkas lemma. They are based on
• theorems of the alternative
• transposition theorems
• duality theorems
• good characterizations
• certificates for validity
• separation theorems
--
It seems that the theorems on polytopes become really nice, simple for when the underlying polygons are regular n-gons, called simplicial polygons.
--
JargonWatch: Affine multi-dimensional scheduling of unitarized polyhedral reduced dependence graphs using a decomposition of Karp-Miller-Winograd
--
When we unitarize the PRDG, of the (n+d) dimensions, we obtain a n-gon in all dimensions. The surface lies inside a unit sphere of (n+d) dimensions. The question seems to be which diameter of the sphere has the greatest weight (measured as the largest number of edges parallel to it). What if we take an arbitrary diameter and split all the edges into three classes, (i)ones parallel to it, (ii)ones in a "greater than" direction to it and (iii) ones in a "lesser than" direction to it. The greater than direction can be determined by evaluation of the dot product of that particular vector with the pivot. The lesser than, also can be done similarly. Once the partitioning is done, the algorithm can work on any of the three poartitions greedily Is this right???
--
Why is can we not find the median of the slopes of the vectors? This can be found in linear time.
• ## Thursday, July 07, 2005

### separating hyperplanes, memory allocation

QR say that for a given schedule, the projective memory allocation method gives the minimal amount of memory required.
--
What can we do with the separating hyperplanes to minimize the amount of residual memory?
--
The general approach to solve the ZC detection problem: Compute a separating hyperplane called a certificate (using a LP in the case of KMW, KS, DV-KMW or a faster way as in CM-Z) and use the certificate to partition, or decompose, the graph. CM-Z say that the depth of the decomposition tree is bounded by the dimensions of the weights. Should it be n? no. It should be min(n,d).
--
More about QR memory allocation: Darte and others have also proposed a memory allocation algorithm. Their paper, DSV-LM has a nice overview of QR.
--
To do: QR has a nice overview of the scheduling problem in general. Read it.

## Wednesday, July 06, 2005

### separating hyperplanes for unitary graphs

Taking the clue from CM-Z about the ZC problem into, (1) finding a hyperplane that separates and (2) partitioning the graph using the hyperplane, we solve the problem (1) in faster time, using an oracle.
--
Theorem on convex polytopes: Two closed, convex, disjoint sets can be separated by a hyperplane.
Yes, but how to find in fastly when the distances are in the range {-1,0,+1} ideas: do a recursive separation?
--
Important theorem on convex polytopes: A convex polytope is the sum of a polytope and a cone.
--
When we unitarize, we get some kind of patterns in the incidence matrix(top part of B). What is it?
--
Questions about KMW-EX-577: What is special about edges e2 and e4, the ones from v1 to v2 and back again? What makes them disappear? How are they weakly connected? What makes edges e1 and e3, the self loops, stay? When we draw the vectors on a 2-dim hypercube, which happens to be a square, e2 and e4 are parallel (anti parallel to be precise) and are the maximal number of vectors that can form a ZC. Is this because, we are maximizing the the set of vectors that can stay for next level of decomposition?
--
Construct an example where there are 2 edges which are antiparallel in one direction like {(1,-1),(-1,1)} and another set of edges (say 3) which are anti-parallel in another direction like {(1,1),(-1,-1)(1,1)}. What will the KMW return?
--
KMW notes: page 586-587, example 10, fig9 and page: 587, example 11, fig-10. KMW generate two schedules, the free schedule and smoothened schedule.
--
To do: Write the duals of the KMW-EX and get the polygons.
To-do: Prove Conjecture: get the hyperplane which passes through the origin and is parallel to the maximum number of hyperplane. This vectors that are not parallel to this hyperplane are weakly connected. So, It seems that a sort (bucket sort?? in linear time???) of the dependence vectors invariant to the a change of sign, which means that (1,-1) and (-1,1) map to the same bucket, is all is required.

To do: Verify conjecture for a two PRDGs: one that is easy to reason and one that is complicated that the algorithm finds one.
--
When we look at the columns of the matrix B and ask when are two columns linearly dependendent and how can you maximize the dependent edges?
--

## Tuesday, July 05, 2005

### Unitary graph zero cycle detection

We can unitarize a PRDG arbitrarily, so that all the components of the graph are from the set {-1,0,+1}.

In 2-dimensions, all the edge weights have been mapped to the 9 points on a unit square.
--------
1 2 3
4 5 6
7 8 9
-------

Now point 5 does not add to the weight of any cycle.
Call the points (1,4,7) as -ve-red, call points (3,6,9) as +ve-red
Call the points (7,8,9) as -ve-blue, call points (1,2,3) as +ve-blue

--
To do: Look at the examples in KMW
--
unitarization may not be necessary, if we somehow convert a distance vector into into its angle represented by a value on the unit-hypercube and magnitude represented by ????????????
--
In higher dimensions, we can consider the 3X3 matrix, with some less number of nodes as base case and properly split graphs of large number of nodes, edges and high dimensions.
--
To Do: do a search for fast linear programming algorithms when the matrices are unitary. May want to search for linear programming unit cost etc...
--
Imagine the graph annotated by the the vectors. Now Decompose the components into atmost two components, each is from of 2,4,6,8. Remove the edges which donot have components that influence the components on another side.?????
---

### Arvind Sharma's Experiential Approach to Advaita Vedanta

The third part of the book covers Advaita Vedanta in an Experiential approach. Sharma asks, why an experiential approach? Are not the usual scriptural and rational approaches enough? He answers the question, giving an example of the the three states in which human beings can exist are waking, dreamin and deep-sleep. It seems that the
scriptural approach ==> deep sleep
rational approach ==> waking
experiential approach ==> dreaming state

If you ask most people, to group two of these as close together and the third as different, most people would give the configuration {waking}, {dreaming, deep-sleep}. This is because, the states in the second set are "without activity". Advaita notes that the following division is more appropriate: {waking, dreaming}, {deep-sleep}. This can be substantiated from the famous examples from Janaka dreaming that he was a beggar. Tzu dreaming he was a butterfly. Incidentally, it was Chuan-Tzu, not Lao-Tzu who dreamed so. He is said to be second only to Lao-Tzu as a representative of Taoism.

## Monday, July 04, 2005

### Arvind Sharma's book on Advaita Vedanta

Reading the book "Advaita Vedanta: An Introduction" by Arvind Sharma. The Amazon link is this. In the book, he takes a simple introductory approach to the subject. The book seems a little verbose at places, but has summaries at the end of large paragraphs. It may be useful for someone to read this book prior to reading to Deutsch's rather terse book(Amazon link). Arvind Sharma takes a tri-fold approach to explain the concept of Advaita Vedanta: They are (1) the Scriptural, (2) the Rational and (3) the Experiential approach.

Shri. Sharma, in the introduction of the book, begins by stating that both philosophy and religion try to pose the fundamental question "What is real?"
The book begins with how the Hindus wanted to attain liberation in this world itself (jivanmukti) rather than, postponing it to a later time, either death or on the judgement day, as in Christianity and Islam.

The introduction explains the six systems of Indian philosophy nicely. My notes from Dasgupta's book on the same subject is here. Both agree nearly. Sharma orders the astika systems as (Nyaya, Vaiseshika), (Samkhya, Yoga) and (Mimamsa, Vedanta). The first pair, he notes, are closer to naastika systems, while the second pair accept Vedas a-posteriori. Sharma also notes that Mimamsa and Vedanta take a very close approach to how the Vedas are interpreted, i.e., both the schools base themselves on the Vedas. They are based on the "anta" part of the Vedas. Though, as in some other religions, Hindus accept that the Vedas were received, Hindus also recognize the limitlessness or infinity of Vedas. This also means that Hindus attribute no specific time or place or persons -- either cosmic or human -- for the "receiving process" of the Vedas.
Also he explains the four parts of each Veda as (i) Mantras or Samhitas: the hymns in praise of gods, (ii) Brahmanas: the prose explanations of the ritual use of these hymns (iii) Aranyakas: reflections on the significance of the ritual and (iv) Upanishads: Secret texts meant to communicate the highest mysteries which go beyond ritual into the realm of spiritual knowledge.
In the explanation of the first Mahavakya Aham Brahma Asmi (I am Brahman), Sharma nicely notes that though everything "is" Brahman, the way in which Maya (or the universe) "is" Brahman is not the same as Atma "is" the same as Brahman.

The conclusions has the following translation of the famous verse on the main tenet of Advaita:

The non-duality of the Brahman,
The non-reality of the world
and the non-difference of the Atman from the Brahman
These constitute the teachings of Advaita

Considering that one of Mr. Sharma's books is dedicated to Eliot Deutsch, It is surprising that the book does not to cite Deutsch's book.

On another note, Arvind Sharma's experiential approach to Advaita seems to be unique and special. This is partly because, one of the primary questions of Advaita is "What is Real?". This question has to arise in the seeker's mind, after he feels that some of his experiences are not real!

A reader on the advaitin mailing list had recommended strongly, the book "The Rope and the Snake" by him. This is the Amazon link.

## Sunday, July 03, 2005

### depth of KMW-T and oracles

Suppose an oracle tells us that the depth of the KMW-T is 1. How can we verify that? by finding the witness vector or separation vector. which one??
--
If we generalize the idea of the oracle further and say that we can verify in some time T \in POLY(n,m,d,max-distance), whether the KMW-T has depth is k or not. Suppose an oracle tells us that the depth of the KMW-T is <= k. A (binary?) search across k would give us the certificate to verify the statement. This would take the running time of the algorithm to be T * log k. How can we do better?
--
to do:
• Get the definitions of witness and separation vectors (of various kinds).
• understand the concepts of independent sub-space and maximal independent subspace.
--
Prime and composite nodes: In the previous post, there were a couple of analogies between the different decomposition trees. Let us call a leaf of a decomposition tree prime, if it is annotated with a component which is a node of the original graph. Otherwise, call the leaf as composite. The whole problem of ZC-detection is but seeing if there exists a composite node.
--
Define an edge casting a shadow on another component if the edge is in the maximal independent subspace of the component????? So, can we begin with a node and keep adding edges to it till we detect a ZC?
--
To do: get the idea of maximal independent subspace and difference between the witness and certificate of CM-Z.
--
Top-down decomposition Vs. Bottom-up construction: How to unify two components?
--
To-do: understand the KMW-EX with depth of 1 and 2. how and why are the depths of that size.
to-do: frame a LP program which takes a two connected components and returns a parent node with a certificate. Suppose we want to add a third node which is in the same strongly connected component as these two, it need not have a parent which is different from these two. It can be attached to the same parent as the original. The certificate stored at the parent may or maynot change.
--
The KMW-EX -577 is very useful to know about the longest paths. Visalize a bounding box, with nodes of two colors and the dependences. If there is a circuit in G^\infty, there is a loop in the PRDG.
--
The memory optimization by building through subspaces and simultaneously increasing the "range" of subspaces.
• ## Friday, July 01, 2005

### depth of the decomposition tree

Analogies for decomposition tree: If you compare the decomposition tree of KMW with the comparision tree of a sorting algorithm (like QSORT), strange things come into picture. The analogy between these two -- lets call them KMW-T and QSORT-T -- makes sense. We are decomposing the input, which is a graph in KMW or an array in QSORT, by a criterion and constructing a tree which stores at the internal nodes, the certificates for the decomposition and at the leaves, the individual components. However, the KMW-T need not be a binary tree, while the SORT-T has to be binary, because of the linear order in the latter.
So, a better comparision of KMT-T is with the decomposition tree of an algorithm that detects Strongly Connected Components in a directed graph. Call such a tree, as SCC-T. Both the KMW-T and SCC-T look for some kind of "weakly connected components" and break the edges that connect them. At the leaves of both the trees are components which cannot be decomposed further (they could be trivially, individual nodes). At the internal nodes of both the trees are some kind of "certificates", which can be used verify the validity of the decomposition. Both the algorithms are greedy(?) and recursive in the sense that they do the same on the smaller components. Both the algorithms are top-down.
Postscript (added on 9-JUL)If we want to be more precise about the certificates etc, look up the example in Schrijver book, about Planarity testing of Hopcroft-Tarjan. They reduce the problem into bipartite testing and planarity testing of smaller graphs.
--
Question: Is there any bottom up SCC algorithm? If there is one, can KMW-T be constructed in a bottom-up fashion? Given the KMW-T, QR are doing a memory-allocation, traversing it bottom-up(Is this right?). If so, can it be done top-down, using the certificates and at the same time as a certificate is obtained?
--
fact: A decomposition tree never has a depth of more than n. This seems easy to prove: At a level if the decomposition algorithm is run, atleast one node of the PRDG gets disconnected, or we have a ZC. Is this a situation that we want to avoid, or having a \Theta(n) depth is good?
--
What is the difference between the parallelism obtained between two PRDGs, each with n nodes, that induce a O(log n) depth decomposition tree and O(n) decomposition tree? More importantly, what propery of the graph makes it induce a KMW-T of depth of O(log n) or O(n) or O(1)? This is a more important question because, the depth of the tree is a property of the graph, not of the algorithm. What property? How can it be detected, in other ways? How can you design an algorithm which gives another depth for the same graph? Why are constant depth trees more useful/interesting?
Question: Is the separation of the components of the graph based on a weakly separating hyperplanes the right way for obtaining maximal-parallelism or even, minimal-memory-allocation? Can we think of a median-separating hyperplane (ala median) that divides the components (collection of nodes and edges) equally?
Is this the only way of decomposing the tree?
--
Most of the above questions can be answered by knowing more about KMW-decomposition and reasoning about the hyperplanes which separate some components. How, why and is that the only way?
--
CM-Z ideas: independent subspace, maximal independent subspace is the same as separation of the space into independent isomorphic flats
--
To do: Get the definitions of weakly and strongly separating hyperplanes from DV-KMW. There are many types of separation of convex polyhedra by a hyperplane like "freely separated", "properly separated", "strictly separated" and "strongly separated" Some of them are explained intuitively here (a pdf link). The lecture notes is from a course on non-linear optimization. The course page is this. Look at the lecture notes and recitations.
To do: Understand the notions of separating hyperplanes from KMW with an example and frame an example when the depth of the decomposition tree is n and hence write the parallel code.
Question: What notion of separating hyperplane do CM-Z use?
--
Separability is important for ZC-detection and multi-dimensional scheduling. How is it useful for memory-allocation? QR do a (1) ZC detection followed by (2)MDS followed by (3)optimal-memory-allocation. Can all of (1), (2) and (3) be done together with a convex kind of solution?
--
Worked the examples from KMW (two nodes and depth 2 on page 577, two nodes with depth 1 on page 585 and two nodes with depth 2 on page 585), DV-94-24 (lone example, depth 2), DV-KMW??????? some edges simply disappear, !!!
Arbitrarily unitarized the KMW-EX on page 585 (fig7). The depth of KMW-T is still the same (see theorem below).
Note: the example (fig 7) of KMW on page 585 with depth 1.
To do: KMW big example?
--
Theorem: The depth of the decomposition algorithm is invariant to unitarization. Proof????
--
notes about TUMity: a self edge only induces zeroes on its incidence matrix. multiple parallel edges donot spoil the TUMity of the incidence matrix as they are different edges.
--
What has the TUMity of the unitarized graph has to do with a decomposition tree of height 1?????
--
The page has corrections to the Tarjan's book. They are the ones that I have noticed before.
--
The QR algorithm needs the (possibly multi-dimensional) schedule for each variable. Can the memory optimization be done in tandem with the KMW-decomposition algorithm?