Neil Gunther has sent you a link to a blog:
Athanasios, Here is my more considered response to the issue of a GUI for PDQ. Hope this clarifies. Best wishes.
Blog: Performance Dynamics Agora
Post: Why Doesn't PDQ Have a GUI?
Link: http://perfdynamics.blogspot.com/2007/09/why-doesnt-pdq-have-gui.html
--
Powered by Blogger
http://www.blogger.com/
Thursday, 15 November 2007
Monday, 27 August 2007
PDQ BUGS
- over 10 nodes in a network throws Exception when EXACT solve method / with APPROX it works but I don't whats the quality of the results.
- GetThrouput() / GetResidenceTime() / GetUtilization() methods have a problem when they are about o return results for a ISRV node (delay cetre)... totally unexpected results are shown instead of 100 and 0 ... I really don't know why.
Lack of support for MSQ nodes:
I have implemented multiserver queues in the PDQ C-language only library, and it is now in beta test.
The goal is to allow you to create OPEN-circuit models (initially) by defining any sequence of M/M/m queues. Instead of introducing a completely new procedure, like PDQ_CreateMultiNode (or PDQ::CreateMultiNode described in Chapter 6 of the Perl::PDQ book), I have come up with a simpler idea. You will call PDQ_CreateNode() in the usual way but, by asserting a new MSQ flag to tell PDQ the node is multi-server queue, you then specify the integer number of servers (m) for that node. For example, an m = 10 multiserver PDQ node named "mserver" would be instantiated using the standard CreateNode procedure with the following arguments:
PDQ_CreateNode("mserver", 10, MSQ).
The PDQ generic report has also been altered to include the new MSQ node.
More details will be forthcoming as the implementation unfolds. Stay tuned.
- GetThrouput() / GetResidenceTime() / GetUtilization() methods have a problem when they are about o return results for a ISRV node (delay cetre)... totally unexpected results are shown instead of 100 and 0 ... I really don't know why.
Lack of support for MSQ nodes:
I have implemented multiserver queues in the PDQ C-language only library, and it is now in beta test.
The goal is to allow you to create OPEN-circuit models (initially) by defining any sequence of M/M/m queues. Instead of introducing a completely new procedure, like PDQ_CreateMultiNode (or PDQ::CreateMultiNode described in Chapter 6 of the Perl::PDQ book), I have come up with a simpler idea. You will call PDQ_CreateNode() in the usual way but, by asserting a new MSQ flag to tell PDQ the node is multi-server queue, you then specify the integer number of servers (m) for that node. For example, an m = 10 multiserver PDQ node named "mserver" would be instantiated using the standard CreateNode procedure with the following arguments:
PDQ_CreateNode("mserver", 10, MSQ).
The PDQ generic report has also been altered to include the new MSQ node.
More details will be forthcoming as the implementation unfolds. Stay tuned.
Thursday, 23 August 2007
PageRank Algorithm
PageRank(DirectedGraph graph, double bias, String edgeWeightKeyName)
Specialized constructor that allows the user to specify an edge key if edges already have user-defined weights assigned to them.
This algorithm measures the importance of a node in terms of the fraction of time spent at that node relative to all other nodes. This fraction is measured by first transforming the graph into a first-order Markov chain where the transition probability of going from node u to node v is equal to (1-alpha)*[1/outdegree(u)] + alpha*(1/|V|) where |V| is the # of vertices in the graph and alpha is a parameter typically set to be between 0.1 and 0.2 (according to the authors). If u has no out-edges in the original graph then 0 is used instead of 1/outdegree(v). Once the markov chain is created, the stationary probability of being at each node (state) is computed using an iterative update method that is guaranteed to converge if the markov chain is ergodic.
PageRank:
PageRank is a link analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank[1]) and later Sergey Brin as part of a research project about a new kind of search engine. The project started in 1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.[2]
Simplified PageRank algorithm
Assume a small universe of four web pages: A, B, C and D. The initial approximation of PageRank would be evenly divided between these four documents. Hence, each document would begin with an estimated PageRank of 0.25.
If pages B, C, and D each only link to A, they would each confer 0.25 PageRank to A. All PageRank PR( ) in this simplistic system would thus gather to A because all links would be pointing to A.
PR(A)= PR(B) + PR(C) + PR(D).\,
But then suppose page B also has a link to page C, and page D has links to all three pages. The value of the link-votes is divided among all the outbound links on a page. Thus, page B gives a vote worth 0.125 to page A and a vote worth 0.125 to page C. Only one third of D's PageRank is counted for A's PageRank (approximately 0.083).
PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}.\,
In other words, the PageRank conferred by an outbound link L( ) is equal to the document's own PageRank score divided by the normalized number of outbound links (it is assumed that links to specific URLs only count once per document).
PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}. \,
In the general case, the PageRank value for any page u can be expressed as:
PR(u) = \sum_{v \in B_u} \frac{PR(v)}{L(v)},
i.e. the PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.
Reference
[] "The Anatomy of a Large-Scale Hypertextual Web Search Engine by L. Page and S. Brin, 1999"
Specialized constructor that allows the user to specify an edge key if edges already have user-defined weights assigned to them.
This algorithm measures the importance of a node in terms of the fraction of time spent at that node relative to all other nodes. This fraction is measured by first transforming the graph into a first-order Markov chain where the transition probability of going from node u to node v is equal to (1-alpha)*[1/outdegree(u)] + alpha*(1/|V|) where |V| is the # of vertices in the graph and alpha is a parameter typically set to be between 0.1 and 0.2 (according to the authors). If u has no out-edges in the original graph then 0 is used instead of 1/outdegree(v). Once the markov chain is created, the stationary probability of being at each node (state) is computed using an iterative update method that is guaranteed to converge if the markov chain is ergodic.
PageRank:
PageRank is a link analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank[1]) and later Sergey Brin as part of a research project about a new kind of search engine. The project started in 1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.[2]
Simplified PageRank algorithm
Assume a small universe of four web pages: A, B, C and D. The initial approximation of PageRank would be evenly divided between these four documents. Hence, each document would begin with an estimated PageRank of 0.25.
If pages B, C, and D each only link to A, they would each confer 0.25 PageRank to A. All PageRank PR( ) in this simplistic system would thus gather to A because all links would be pointing to A.
PR(A)= PR(B) + PR(C) + PR(D).\,
But then suppose page B also has a link to page C, and page D has links to all three pages. The value of the link-votes is divided among all the outbound links on a page. Thus, page B gives a vote worth 0.125 to page A and a vote worth 0.125 to page C. Only one third of D's PageRank is counted for A's PageRank (approximately 0.083).
PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}.\,
In other words, the PageRank conferred by an outbound link L( ) is equal to the document's own PageRank score divided by the normalized number of outbound links (it is assumed that links to specific URLs only count once per document).
PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}. \,
In the general case, the PageRank value for any page u can be expressed as:
PR(u) = \sum_{v \in B_u} \frac{PR(v)}{L(v)},
i.e. the PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.
Reference
[] "The Anatomy of a Large-Scale Hypertextual Web Search Engine by L. Page and S. Brin, 1999"
Wednesday, 22 August 2007
The Evolution of Queueing Networks for Computer System Analysis
The interaction between the system workload and the system resources determine the behaviour of a system in general. During the early period of queueing analysis the system was treated as a single queue which was serving an infinite population of tasks or jobs. This kind of model, which is based on the classical queueing theory, gives no clues to the relation between the external behaviour and the internal structure of the computing system. Additionally, this model has the limitation of the infinite population. Thus it can not represent the common situation of existing only a finite population of users.
Fig 1.0
In order to overcome this limitation, another model was created, which introduces a number of N terminals to the system, one for each user. In this model the number of jobs is N, and each of these jobs can be in one of the following states:
- “thinking” at the terminal,
- waiting in the queue of the central system,
- or being served by the central system.
Fig 1.1
However, this model, which is known as the “machine repairman” model, fails to represent the complexity of a typical computing system.
A significant step for the evolution of queueing networks occurred with Jackson's model. Jackson has developed a technique for solving a special category of queueing networks. These networks of queues were only involving a single class of jobs. There was also a distinction between open and closed queueing networks. In an open queueing network the jobs were only permitted to enter or leave the network, while in a closed queueing network the jobs were remaining inside. In Jackson's approach each node/queue contains one or more parallel servers, while the service time distributions, for each of these nodes, are assumed to be exponential.
A state of a network which contains M nodes/stations can be expressed a vector:
and expresses the number of jobs at node/station i.
The objective of analysing a Jackson's queueing network model is to find a closed form expression for the equilibrium probability of finding the network in any given state , which is:
Knowing the probabilities of this equilibrium state, several performance variables for the evaluation of the system can be derived, such as the utilization and the average queue length of each node and the throughput of the workload of the network.
If is the marginal probability of finding n jobs at the node then:
The probability of a node to be busy is called utilization it is computed as:
Jackson also proved something that it is termed as a “product form” which defines that, in an open network, the equilibrium probability of state , factors into the product of each of the marginal distributions:
Later, and without knowing about the earlier work of Jackson, two scientists, Gordon and Newell, obtained the equilibrium state probabilities for closed queueing networks. Again the service time distributions for each node was assumed to be exponential. The closed network they studied had a finite number of jobs, N and a finite number of nodes, M. Additionally, each node, i, may have a finite numberof parallel identical servers which have exponentially distributed service times as well. Independently of the state of the network, one job that has just serviced from the node i will be transmitted to the node j with probability .
A state of a network which contains M nodes and N jobs can be expressed as a vector again:
Now, because the network is closed, the sum of all the , must be N.
Gordon and Newell proved what exactly Jackson has proved for the open queueing networks some time ago, but applied to closed queueing networks this time. The equilibrium probability of state in a closed queueing network also satisfies a “product form”:
where G(N) is the normalisation constant which has been chosen to make all the feasible state probabilities sum to one:
where depends on the characteristics of the node:
, if node i is load dependent
OR , if node i is load independent.
In order to evaluate the normalisation constant G(N), a large number of operations is required. Thus, though the above product form solutions are easily expressed, the evaluation of the state probabilities can be computationally expensive. Additionally, the accuracy of the results is in doubt because there is a large number of finite precision arithmetic operations involved.
An efficient computational algorithm for evaluating the normalisation constant was developed by Buzen in the early seventies. Moreover, he studied ways of obtaining the performance measures of a queueing network, like utilisation, throughput and queue length, creating algorithms with simple functions that involve the normalisation constant and the model parameters. In that way more complex systems could be effectively studied.
Continuing his research, Buzen, tried to construct models of queueing networks, that would represent accurately the flow of the several jobs through a computer system. The result of his research was the construction of the “Central Server Model” of a multiprogramming system. In this model the nodes represent the processors of the computer system, while the several jobs/programs move in the direction of the arrows. A Job in this model can visit one node/processor at a time and when one job arrives to a node/processor which is busy, then a queue is formed. Using this type of queueing network models, bottlenecks of a system could be traced easier.
Fig 1.2.
There are two principal limitations to the models that Jackson and Gordon and Newell have studied. In these queueing networks, there is only one class of jobs, and all service time distributions are exponential. However, in a real system these principals are not generally valid, because time distributions are seldom exponential, while each job has its own service parameters for each node.
There are several queueing models that have been proposed in order to represent more accurately real computer systems. The truth is that all of these models expand and actually constitute special cases of the model developed by Baskett, Chandy, Muntz and Palacios (BCMP) in 1975.
The model described by Baskett, Chandy, Muntz and Palacios, in its general form, does not contain the two principal limitations of the previous models. Jobs are divided into R distinct classes, while each class may have different parameters such as a unique set of transition frequencies which has the form:
This expression describes the fraction of departures from node i in class r that go next to node j in class s. The above transition frequency expresses also an additional property of the jobs, which is that they can change class while they are moving through the system. A state of this kind of queueing network represents a distribution of jobs over classes and nodes.
A state of a network which contains M nodes can be expressed as a vector:
, where each component of is also a vector:
.
These vectors express the number of class r jobs at node .
These kind of queueing networks, that have been described by Baskett, Chandy, Muntz and Palacios in 1975, are also called BCMP networks (or models), and they constitute, until now, the most commonly used types of models for system performance evaluation. They can describe accurately a large variety of computer systems, much wider than the restrictive and simple models, introduced by Jackson and Gordon and Newell, are capable to. In general, a BCMP model is capable to describe an open, closed or mixed queueing network. A mixed system is closed for a specific set of classes of jobs and open for the other classes of jobs. However, the complexity of these systems, makes the calculation of their performance measures much more expensive in computing resources.
In a BCMP model each server can have one of the following queue disciplines (service policies):
- FCFS (First Come First Served): The job that arrives earlier in the queue is always served earlier than the one that arrives later, independently to the class that this job belongs to.
- PSHR (Processor Sharing): A single server chooses, using the round robin method, and serves one of the jobs that are waiting in its queue, for a quantum of time.
- ISRV (Infinite Server): An Infinite Server or Delay Centre is considered to be a node in which the number of servers it contains is always at least the maximum number of jobs coming at the node. In this way there is never a job waiting to be served in the queue.
- LCFS (Last Come First Served): The most recent job that arrives to the node, is the one that is served first.
In order to solve queueing network models it is essential to make some assumptions about the properties that influence the transition of the model, from one state to another. Buzen and Denning have studied these necessary assumptions that need to be made in order to simplify the procedure of balancing the flow into and out of system states. In order to simplify this procedure, it is assumed that there are no multiple job arrivals or departures in the system. This assumption, known as “one step transitions”, implies that the rate of job flow between the nodes is the only parameter that determines the rate a system enters or leaves a state. A second assumption declares that the queue length of the specific node is only a parameter that influences the flows out of a node, and not the queue lengths of the other nodes of the system. This assumption is known as “homogeneity”. []
References
[] Steven C. Bruell and Gianfranco Balbo, “Computational Algorithms for Closed Queueing Networks”, Elsevier North Holland, Inc. 1980.
Fig 1.0
In order to overcome this limitation, another model was created, which introduces a number of N terminals to the system, one for each user. In this model the number of jobs is N, and each of these jobs can be in one of the following states:
- “thinking” at the terminal,
- waiting in the queue of the central system,
- or being served by the central system.
Fig 1.1
However, this model, which is known as the “machine repairman” model, fails to represent the complexity of a typical computing system.
A significant step for the evolution of queueing networks occurred with Jackson's model. Jackson has developed a technique for solving a special category of queueing networks. These networks of queues were only involving a single class of jobs. There was also a distinction between open and closed queueing networks. In an open queueing network the jobs were only permitted to enter or leave the network, while in a closed queueing network the jobs were remaining inside. In Jackson's approach each node/queue contains one or more parallel servers, while the service time distributions, for each of these nodes, are assumed to be exponential.
A state of a network which contains M nodes/stations can be expressed a vector:
and expresses the number of jobs at node/station i.
The objective of analysing a Jackson's queueing network model is to find a closed form expression for the equilibrium probability of finding the network in any given state , which is:
Knowing the probabilities of this equilibrium state, several performance variables for the evaluation of the system can be derived, such as the utilization and the average queue length of each node and the throughput of the workload of the network.
If is the marginal probability of finding n jobs at the node then:
The probability of a node to be busy is called utilization it is computed as:
Jackson also proved something that it is termed as a “product form” which defines that, in an open network, the equilibrium probability of state , factors into the product of each of the marginal distributions:
Later, and without knowing about the earlier work of Jackson, two scientists, Gordon and Newell, obtained the equilibrium state probabilities for closed queueing networks. Again the service time distributions for each node was assumed to be exponential. The closed network they studied had a finite number of jobs, N and a finite number of nodes, M. Additionally, each node, i, may have a finite numberof parallel identical servers which have exponentially distributed service times as well. Independently of the state of the network, one job that has just serviced from the node i will be transmitted to the node j with probability .
A state of a network which contains M nodes and N jobs can be expressed as a vector again:
Now, because the network is closed, the sum of all the , must be N.
Gordon and Newell proved what exactly Jackson has proved for the open queueing networks some time ago, but applied to closed queueing networks this time. The equilibrium probability of state in a closed queueing network also satisfies a “product form”:
where G(N) is the normalisation constant which has been chosen to make all the feasible state probabilities sum to one:
where depends on the characteristics of the node:
, if node i is load dependent
OR , if node i is load independent.
In order to evaluate the normalisation constant G(N), a large number of operations is required. Thus, though the above product form solutions are easily expressed, the evaluation of the state probabilities can be computationally expensive. Additionally, the accuracy of the results is in doubt because there is a large number of finite precision arithmetic operations involved.
An efficient computational algorithm for evaluating the normalisation constant was developed by Buzen in the early seventies. Moreover, he studied ways of obtaining the performance measures of a queueing network, like utilisation, throughput and queue length, creating algorithms with simple functions that involve the normalisation constant and the model parameters. In that way more complex systems could be effectively studied.
Continuing his research, Buzen, tried to construct models of queueing networks, that would represent accurately the flow of the several jobs through a computer system. The result of his research was the construction of the “Central Server Model” of a multiprogramming system. In this model the nodes represent the processors of the computer system, while the several jobs/programs move in the direction of the arrows. A Job in this model can visit one node/processor at a time and when one job arrives to a node/processor which is busy, then a queue is formed. Using this type of queueing network models, bottlenecks of a system could be traced easier.
Fig 1.2.
There are two principal limitations to the models that Jackson and Gordon and Newell have studied. In these queueing networks, there is only one class of jobs, and all service time distributions are exponential. However, in a real system these principals are not generally valid, because time distributions are seldom exponential, while each job has its own service parameters for each node.
There are several queueing models that have been proposed in order to represent more accurately real computer systems. The truth is that all of these models expand and actually constitute special cases of the model developed by Baskett, Chandy, Muntz and Palacios (BCMP) in 1975.
The model described by Baskett, Chandy, Muntz and Palacios, in its general form, does not contain the two principal limitations of the previous models. Jobs are divided into R distinct classes, while each class may have different parameters such as a unique set of transition frequencies which has the form:
This expression describes the fraction of departures from node i in class r that go next to node j in class s. The above transition frequency expresses also an additional property of the jobs, which is that they can change class while they are moving through the system. A state of this kind of queueing network represents a distribution of jobs over classes and nodes.
A state of a network which contains M nodes can be expressed as a vector:
, where each component of is also a vector:
.
These vectors express the number of class r jobs at node .
These kind of queueing networks, that have been described by Baskett, Chandy, Muntz and Palacios in 1975, are also called BCMP networks (or models), and they constitute, until now, the most commonly used types of models for system performance evaluation. They can describe accurately a large variety of computer systems, much wider than the restrictive and simple models, introduced by Jackson and Gordon and Newell, are capable to. In general, a BCMP model is capable to describe an open, closed or mixed queueing network. A mixed system is closed for a specific set of classes of jobs and open for the other classes of jobs. However, the complexity of these systems, makes the calculation of their performance measures much more expensive in computing resources.
In a BCMP model each server can have one of the following queue disciplines (service policies):
- FCFS (First Come First Served): The job that arrives earlier in the queue is always served earlier than the one that arrives later, independently to the class that this job belongs to.
- PSHR (Processor Sharing): A single server chooses, using the round robin method, and serves one of the jobs that are waiting in its queue, for a quantum of time.
- ISRV (Infinite Server): An Infinite Server or Delay Centre is considered to be a node in which the number of servers it contains is always at least the maximum number of jobs coming at the node. In this way there is never a job waiting to be served in the queue.
- LCFS (Last Come First Served): The most recent job that arrives to the node, is the one that is served first.
In order to solve queueing network models it is essential to make some assumptions about the properties that influence the transition of the model, from one state to another. Buzen and Denning have studied these necessary assumptions that need to be made in order to simplify the procedure of balancing the flow into and out of system states. In order to simplify this procedure, it is assumed that there are no multiple job arrivals or departures in the system. This assumption, known as “one step transitions”, implies that the rate of job flow between the nodes is the only parameter that determines the rate a system enters or leaves a state. A second assumption declares that the queue length of the specific node is only a parameter that influences the flows out of a node, and not the queue lengths of the other nodes of the system. This assumption is known as “homogeneity”. []
References
[] Steven C. Bruell and Gianfranco Balbo, “Computational Algorithms for Closed Queueing Networks”, Elsevier North Holland, Inc. 1980.
Saturday, 18 August 2007
Jung and Visualization
The Java Universal Network/Graph Framework (JUNG) is a freely provided, under the BSD open-source licence, software library that offers a common and extendible language for the manipulation, analysis, and visualization of graphs and networks. It has been developed in Java programming language so it is portable and extendible. JUNG is not itself a stand alone tool, but it constitutes an object oriented framework which can be used to support the construction of applications and tools for manipulating data that can be represented as a graph or network. []
The architecture of this library supports a variety of representations of entities and their relations. It has been designed to support directed and undirected graphs, graphs with parallel edges, multi-modal graphs, and hypergraphs. Moreover, it provides a built in mechanism for annotating graphs, entities (Vertices) and relations (Edges) with data structures (metadata). It supports complex filtering mechanisms, which extract subsets of a network, and dynamic graphs. Additionally, the JUNG software includes implementations of a number of algorithms. These algorithms are used in exploratory data analysis, social network analysis, graph theory and machine learning. They can be used for clustering, optimization, statistical analysis, decomposition, random graph generation and calculation of network distances, flows, and ranking measures.
The developers of JUNG, in order to reduce the duplication of effort required for implementing certain functions that their software needed to perform, have used other, also platform-independent, Java libraries. Jung uses the following Java libraries:
Commons Collections (Apache Jakarta Project (2004)) library which enhances the basic Java API for collections of objects.
Colt (CERN (2004)) is a set of libraries for matrix algebra functions and generally for high-performance scientific and technical computing.
Xerces (Apache XML Project (2004)) is the common library, used for implementing the GraphML I/O capabilities, for XML parsing.
Building JUNG-based applications will require all of the above, plus the Java SDK (version 1.3 or later).
Except from the powerful JUNG API, there exist a number of other packages and tools for manipulating and visualizing network data such as the: Pajek (Batagelj and Mrva (2004)), the UCINET (Borgatti, Everett, and Freeman (2004)), the R (R Development Core Team (2004)) with sna (Butts (2004)), the NetMiner (Cyram Company, Ltd. (2004)), the MultiNet (Richards and Seary (2004)), the GFC (IBM Corporation (1999), the StOCNET (Boer, Huisman, Snijders, and Zeggelink (2003)), the Info Vis Cyberinfrastructure (Penumarthy, Mane, and Borner (2004)), the Info Vis (Fekete (2004)), the Visone (Brandes and Wagner (2003)), the Jgraph (Alder (2005)), The yFiles (yWorks (2004)) and the Boost (Siek, Lee, and Lumsdaine (2001)). []
Additionally, some extra open source projects for graph or network visualization written in Java are the: GiNY, HyperGraph, JDigraph, WilmaScope, JGraphT, TouchGraph, GVF, JgraphEd, VGJ, GUESS, Zoomgraph, Walrus, Prefuse, ZVTM, Large Graph Layout, NV2D, Otter, Gravisto, Linguine Maps, JIGGLE, Zest, Relo and SoNIA.
It should be reported that initially, the JGraphT software was selected for creating the GUI for the present project. JGraphT seems to be a simple and helpful software for creating graphical representations of network data, only if these data are already known. It does not seem to be appropriate for creating dynamic graphs. That was the reason why the powerful and flexible JUNG API was used in the end.
There is a long list of software projects that use JUNG for network data visualization. Some of them are listed here: Djinn (Fabien Benolt), RDF Gravity (Sunil Goyal, Rupert Westenthaler), GUESS (Eyatan Adar, David Feinberg), ADAPTNet (Michal Okoniewski, Tim Yates), Augur (Jon Froehlich), Ariadne (Cleidson de Souza et al.), Netsight (Yan-Biao Boey, Joshua O' Madadhain, Scott White, Padhraic Smyth), InfoVis CyberInfrastructure (Penumarthy, Mane, and Borner (2004)), PWComp (Joshua Adelman, Josh England, Alex Chen), Google Cartography (Richard Jones), GINY (Rowan Christmas), GraphExplore (Quanli Wang), TOTEM (Toolbox For Traffic Engineering Methods (S. Balon, O. Delcourt, J. Lepropre and F. Skivee), D2K (“Data to Knowledge”), graphBuilder (Ben Raymond), Semiophore, Xholon (Ken Webb), and Flink (Peter Mika). []
Additionally, there is another long list with papers and research projects that involve the JUNG software, which has been decided not to be mentioned. JUNG is a dynamically involving software and its second edition, called Jung 2.0, is currently in its alpha version. It is available to be downloaded from the Open Source Technology Group (SourceForge.net). In this point it is worthwhile to mention that during the first period of developing the GUI for the PDQ library, an effort has been made in order to implement the GUI using the newer second version of JUNG, instead of the 1.7.6. version. Using the second version of the software initially seemed to be an interesting challenge, since it is difficult to combine one quite under engineered software like PDQ with one like Jung2.0, which in its first sight looks quite over engineered. Finally, it was decided that the, very well done, 1.7.6 version could provide enough help to create a GUI for the Pretty Damn Quick library, since Jung2.0 is still in its alpha version and probably some changes will take place in the near future.
References
J O 'Madadhain, D Fisher, S White, Y Boey, “The JUNG (Java Universal Network/Graph) Framework”, (preprint by http://jung.sourceforge.net), 2007
The architecture of this library supports a variety of representations of entities and their relations. It has been designed to support directed and undirected graphs, graphs with parallel edges, multi-modal graphs, and hypergraphs. Moreover, it provides a built in mechanism for annotating graphs, entities (Vertices) and relations (Edges) with data structures (metadata). It supports complex filtering mechanisms, which extract subsets of a network, and dynamic graphs. Additionally, the JUNG software includes implementations of a number of algorithms. These algorithms are used in exploratory data analysis, social network analysis, graph theory and machine learning. They can be used for clustering, optimization, statistical analysis, decomposition, random graph generation and calculation of network distances, flows, and ranking measures.
The developers of JUNG, in order to reduce the duplication of effort required for implementing certain functions that their software needed to perform, have used other, also platform-independent, Java libraries. Jung uses the following Java libraries:
Commons Collections (Apache Jakarta Project (2004)) library which enhances the basic Java API for collections of objects.
Colt (CERN (2004)) is a set of libraries for matrix algebra functions and generally for high-performance scientific and technical computing.
Xerces (Apache XML Project (2004)) is the common library, used for implementing the GraphML I/O capabilities, for XML parsing.
Building JUNG-based applications will require all of the above, plus the Java SDK (version 1.3 or later).
Except from the powerful JUNG API, there exist a number of other packages and tools for manipulating and visualizing network data such as the: Pajek (Batagelj and Mrva (2004)), the UCINET (Borgatti, Everett, and Freeman (2004)), the R (R Development Core Team (2004)) with sna (Butts (2004)), the NetMiner (Cyram Company, Ltd. (2004)), the MultiNet (Richards and Seary (2004)), the GFC (IBM Corporation (1999), the StOCNET (Boer, Huisman, Snijders, and Zeggelink (2003)), the Info Vis Cyberinfrastructure (Penumarthy, Mane, and Borner (2004)), the Info Vis (Fekete (2004)), the Visone (Brandes and Wagner (2003)), the Jgraph (Alder (2005)), The yFiles (yWorks (2004)) and the Boost (Siek, Lee, and Lumsdaine (2001)). []
Additionally, some extra open source projects for graph or network visualization written in Java are the: GiNY, HyperGraph, JDigraph, WilmaScope, JGraphT, TouchGraph, GVF, JgraphEd, VGJ, GUESS, Zoomgraph, Walrus, Prefuse, ZVTM, Large Graph Layout, NV2D, Otter, Gravisto, Linguine Maps, JIGGLE, Zest, Relo and SoNIA.
It should be reported that initially, the JGraphT software was selected for creating the GUI for the present project. JGraphT seems to be a simple and helpful software for creating graphical representations of network data, only if these data are already known. It does not seem to be appropriate for creating dynamic graphs. That was the reason why the powerful and flexible JUNG API was used in the end.
There is a long list of software projects that use JUNG for network data visualization. Some of them are listed here: Djinn (Fabien Benolt), RDF Gravity (Sunil Goyal, Rupert Westenthaler), GUESS (Eyatan Adar, David Feinberg), ADAPTNet (Michal Okoniewski, Tim Yates), Augur (Jon Froehlich), Ariadne (Cleidson de Souza et al.), Netsight (Yan-Biao Boey, Joshua O' Madadhain, Scott White, Padhraic Smyth), InfoVis CyberInfrastructure (Penumarthy, Mane, and Borner (2004)), PWComp (Joshua Adelman, Josh England, Alex Chen), Google Cartography (Richard Jones), GINY (Rowan Christmas), GraphExplore (Quanli Wang), TOTEM (Toolbox For Traffic Engineering Methods (S. Balon, O. Delcourt, J. Lepropre and F. Skivee), D2K (“Data to Knowledge”), graphBuilder (Ben Raymond), Semiophore, Xholon (Ken Webb), and Flink (Peter Mika). []
Additionally, there is another long list with papers and research projects that involve the JUNG software, which has been decided not to be mentioned. JUNG is a dynamically involving software and its second edition, called Jung 2.0, is currently in its alpha version. It is available to be downloaded from the Open Source Technology Group (SourceForge.net). In this point it is worthwhile to mention that during the first period of developing the GUI for the PDQ library, an effort has been made in order to implement the GUI using the newer second version of JUNG, instead of the 1.7.6. version. Using the second version of the software initially seemed to be an interesting challenge, since it is difficult to combine one quite under engineered software like PDQ with one like Jung2.0, which in its first sight looks quite over engineered. Finally, it was decided that the, very well done, 1.7.6 version could provide enough help to create a GUI for the Pretty Damn Quick library, since Jung2.0 is still in its alpha version and probably some changes will take place in the near future.
References
J O 'Madadhain, D Fisher, S White, Y Boey, “The JUNG (Java Universal Network/Graph) Framework”, (preprint by http://jung.sourceforge.net), 2007
Friday, 17 August 2007
Millennium Performance Problems and the contribution of the Jung API
The first of the Millennium Performance Problems, as expressed by Neil J. Gunther, is the performance Visualization. The idea is to find alternative ways of representing performance data. The traditional reports which contain a set of tables with numbers and a variety of two dimensional diagrams, seem not to provide the expected help, when a special set of problems is involved. These are usually complex problems which need a special way of representation in order to become comprehensive from the human brain, and problems that demand a special analysis and solution methodology. Alternative ways of data representation should be considered which would take into account the new discoveries that have been made recently around the humans vision area of research, and generally around the way the human brain can perceive some certain types of information and knowledge. For this reason physicists and biologists use, and also are involved into, projects that have to do with animation and special GUIs, that are used to represent complex data in ways that could help their users to derive and solve problems easier. [Neil J. Gunther]
Thus, in order to achieve more effective representations of complex network data, several graphical analysis tools and frameworks have been developed to provide help to programmers and consultants. In this section, a quick reference to the several open source and freely available software projects, that have been developed for this purpose, will be made, and a short presentation of the Jung API capabilities, will be provided. But in order to complete the reference to the Millennium Performance Problems, it would be better to provide a short description of the other four remaining problems first, as expresses by Neil J. Gunther.
The second millennium performance problem has to do with the so called “Self-instrumented Applications”. It is well-known that object orientation logic has improved the way software is managed and organised. Primarily, this is achieved because in object orientated programming, code can be reused and maintained easier. The idea behind the self-instrumented applications is based on the same rationale, applied to the programmers or generally to the productive resources, themselves. These kind of objects should be then added to the object library as well. In this way, the performance analyst could have the ability to turn objects on and off selectively, achieving a more flexible and probably more efficient management, since paths to find bottlenecks through applications code, could be traced easier. Obviously, applying this logic could contribute to the analysis of production systems
The Von Neumann Bottleneck is the third millennium performance problem. Parallelism, is the technical term for doing more than one thing at once in order to achieve computational efficiency. In parallel computing centres, where supercomputers made either with clusters or with Grid computing techniques, the maximum efficiency for the system can be achieved if only a single thread is running in each of the processing units and this thread is not terminated until the end of the computation. Unfortunately to achieve practically something like that or to achieve general-purpose parallelism, seems to be an utopia for performance analysts, despite the intense effort that has been done over the last two decades. The paradigm of the human brain architecture has lead to the idea of using neural networks to achieve a higher degree of parallelism. Another idea is to construct quantum computers.
The fourth millennium performance problem as expressed by Neil J. Gunther has to do with the performance analysis of the Internet. Internet communication is based mostly on the concept of receiving and transmitting information coded as a packet of data. Internet packet analysis, which has taken place over the last fifteen years, showed that “long-term correlations can persist over several orders of magnitude in time” [Neil J. Gunther]. It is true that in the packet level, all the conventional queueing theory techniques are no longer valid. Packet arrivals don't follow the Poison distribution, while there are no constant service times. Thus the question is how the Internet could be modelled. A very challenging task could be the development of an Internet Simulator or a new theory for expressing complex systems of the same type.
The final millennium challenge for the performance analysts has to do with the performance analysis of Quantum Computers. The development of Quantum Computers may still sound like a science fiction story, but it is true that there have already been developed and used quantum communication devices for encryption. Maybe they are still very specialised and expensive but it is expected that during the following years, devices like those will be easier and cheaper to a be produced in a large scale. The question is how the wide use of this technology will affect performance. [Neil. J Gunther]
These five millennium performance problems, as they have been expressed by Neil J. Gunther, seem to reveal some interesting points about the upcoming challenges in the area of systems performance evaluation. The present project aims to perform a research and hopefully contribute to the first of these challenges. The visualization of complex data that have to do with the relations between distributed entities expressed as networks of data, and the analysis of the several ways that this can or already has been done, could be the subject of this project.
Thus, in order to achieve more effective representations of complex network data, several graphical analysis tools and frameworks have been developed to provide help to programmers and consultants. In this section, a quick reference to the several open source and freely available software projects, that have been developed for this purpose, will be made, and a short presentation of the Jung API capabilities, will be provided. But in order to complete the reference to the Millennium Performance Problems, it would be better to provide a short description of the other four remaining problems first, as expresses by Neil J. Gunther.
The second millennium performance problem has to do with the so called “Self-instrumented Applications”. It is well-known that object orientation logic has improved the way software is managed and organised. Primarily, this is achieved because in object orientated programming, code can be reused and maintained easier. The idea behind the self-instrumented applications is based on the same rationale, applied to the programmers or generally to the productive resources, themselves. These kind of objects should be then added to the object library as well. In this way, the performance analyst could have the ability to turn objects on and off selectively, achieving a more flexible and probably more efficient management, since paths to find bottlenecks through applications code, could be traced easier. Obviously, applying this logic could contribute to the analysis of production systems
The Von Neumann Bottleneck is the third millennium performance problem. Parallelism, is the technical term for doing more than one thing at once in order to achieve computational efficiency. In parallel computing centres, where supercomputers made either with clusters or with Grid computing techniques, the maximum efficiency for the system can be achieved if only a single thread is running in each of the processing units and this thread is not terminated until the end of the computation. Unfortunately to achieve practically something like that or to achieve general-purpose parallelism, seems to be an utopia for performance analysts, despite the intense effort that has been done over the last two decades. The paradigm of the human brain architecture has lead to the idea of using neural networks to achieve a higher degree of parallelism. Another idea is to construct quantum computers.
The fourth millennium performance problem as expressed by Neil J. Gunther has to do with the performance analysis of the Internet. Internet communication is based mostly on the concept of receiving and transmitting information coded as a packet of data. Internet packet analysis, which has taken place over the last fifteen years, showed that “long-term correlations can persist over several orders of magnitude in time” [Neil J. Gunther]. It is true that in the packet level, all the conventional queueing theory techniques are no longer valid. Packet arrivals don't follow the Poison distribution, while there are no constant service times. Thus the question is how the Internet could be modelled. A very challenging task could be the development of an Internet Simulator or a new theory for expressing complex systems of the same type.
The final millennium challenge for the performance analysts has to do with the performance analysis of Quantum Computers. The development of Quantum Computers may still sound like a science fiction story, but it is true that there have already been developed and used quantum communication devices for encryption. Maybe they are still very specialised and expensive but it is expected that during the following years, devices like those will be easier and cheaper to a be produced in a large scale. The question is how the wide use of this technology will affect performance. [Neil. J Gunther]
These five millennium performance problems, as they have been expressed by Neil J. Gunther, seem to reveal some interesting points about the upcoming challenges in the area of systems performance evaluation. The present project aims to perform a research and hopefully contribute to the first of these challenges. The visualization of complex data that have to do with the relations between distributed entities expressed as networks of data, and the analysis of the several ways that this can or already has been done, could be the subject of this project.
User Data in Jung
There are two ways to associate data with graphs, edges or vertices.
1st. Class Extension, creating a class that extends one of the Graph, Vertice or Edge implementations.
2nd. Jung Annotation, using the existing built-in mechanism (the UserData class) for annotating graph elements with data. Several methods are used to manipulate these data.
I think that the first way is more appropriate for the needs of my project.
1st. Class Extension, creating a class that extends one of the Graph, Vertice or Edge implementations.
2nd. Jung Annotation, using the existing built-in mechanism (the UserData class) for annotating graph elements with data. Several methods are used to manipulate these data.
I think that the first way is more appropriate for the needs of my project.
Subscribe to:
Posts (Atom)