As time permits I will collect electronic forms of my publications,
and link them in here. In the interum, you can likely find them via google.
Exploiting Sparseness of Communication Patterns for the Design of
Networks in Massively Parallel Supercomputers
Ph.D., Electrical and Computer Engineering, University of Kentucky,
August 2006.
Abstract:
A limited set of Processing Element (PE) pairs in a parallel computer cover the internal communications of scalable parallel programs. We take advantage of this property using the concept of Sparse Flat Neighborhood Networks (Sparse FNNs). Sparse FNNs are network designs that provide single-switch latency and full wire bandwidth for each specified PE pair, despite using relatively few network interfaces per PE and switches that have far fewer ports than there are PEs. This dissertation discusses the design problem, runtime support, and working prototype (KASY0) for Sparse FNNs. KASY0 not only demonstrated the claimed properties, but also set world records for its price/performance and performance on a specific application.
Parallel supercomputers execute many portions of an application simultaneously. For scalable programs, the more PEs the system has, the greater the potential speedup. Portions executing on different PEs may be able to work independently for short periods, but the performance desired might not be achieved due to delays in communication between PEs. The set of PE pairs that will communicate often is both predictable and small relative to the number of possible PE pairings. This sparseness property can be exploited in the design and implementation of networks for massively parallel supercomputers.
The sparseness of communicating pairs is rooted in the fact that each of the human-designed communication patterns commonly used in parallel programs has the property that the number of communicating pairs grows relatively slowly as the number of PEs is increased. Additionally, the number of pairs in the union of all communication patterns used in a suite of parallel programs grows surprisingly slowly due to pair synergy: the same pair often appears in multiple communication patterns. Detailed analysis of communication patterns clearly shows that the number of PE pairs actually communicating is very sparse, although the structure of the sparseness can be complex.
Download the 2.7MB PDF locally or from the University of Kentucky.
Synchronous Aggregate Communication Architecture for MIMD Parallel Processing
M.S.E.E., School of Electrical and Computer Engineering, Purdue University,
August 1997.
Abstract:
A multitude of different parallel architectures have been proposed, and each
works well for applications with the appropriate types of parallelism.
However, to achieve the best possible speedup for a wide range of parallel
applications, a parallel computer's hardware must be able to make effective
use of most types of parallelism.
This thesis suggests that the most fundamental flaws of MIMD architecture
can be corrected by the addition of a simple synchronous aggregate
communication system. After reviewing the relationship between some basic
architectural characteristics and the types of parallelism that they can
support, a small set of synchronous aggregate communication operations are
defined. The implementation and performance of the PAPERS1 prototype
hardware, a very simple synchronous aggregate communication system for a
MIMD cluster of PCs, is discussed in detail.
PDF and Postscript Level-2 versions:
PDF (500K, 130 pages),
all.ps (656K, 130 pages),
body.ps (503K, 76 pages),
appendix.ps (161K, 54 pages).
My Publications (Check my CV for an up-to-date list):
A bibliography of publications I have done in conjunction with my
academic advisor
and others.
Please note that some of these links are to compressed (.Z)
postscript files. You will want to choose to save these files locally, and
then uncompress them.
-
Timothy E. Dowling, Mary E. Bradley, Edward Col—n, John Kramer, Raymond P. LeBeau, Grace C.H. Lee, Timothy I. Mattox, Raul Morales-Juber’as, Csaba J. Palotai, Vimal K. Parimi, and Adam P. Showman,
"The EPIC Atmospheric Model with an Isentropic/Terrain-Following Hybrid Vertical Coordinate,"
Icarus (in press), 2006.
-
Timothy I. Mattox, Henry G. Dietz, and William R. Dieter,
"Sparse Flat Neighborhood Networks (SFNNs):
Scalable Guaranteed Pairwise Bandwidth and Unit Latency,"
in the Proceedings of the Fifth Workshop on Massively
Parallel Processing (WMPP'05) held in conjunction with the
19th IEEE International Parallel and Distributed
Processing Symposium (IPDPS 2005), Denver, CO, USA, April, 2005.
Preprints are available as 182KB PDF and 1MB PS versions for personal use only. Slides from the talk are available as 106KB PDF and 140KB PPT versions.
-
Th. Hauser, T.I. Mattox, R.P. LeBeau, H.G. Dietz and P.G. Huang,
"Code Optimizations for Complex Microprocessors Applied to CFD Software,"
SIAM Journal on Scientific Computing, 25(4):1461Ð1477, 2004.
(various)
-
H.G. Dietz and T.I. Mattox,
"Compiler Optimizations Using Data Compression
To Decrease Address Reference Entropy,"
15th Workshop on Languages and Compilers for Parallel Computing (LCPC2002), College Park, Maryland, USA, July 25-27, 2002.
(.pdf 98KB)
-
Thomas Hauser, Timothy I. Mattox, Raymond P. LeBeau, Henry G. Dietz and P. George Huang,
"High-Cost CFD on a Low-Cost Cluster,"
Proceedings of the IEEE/ACM SC2000 conference,
Dallas, Texas, USA, November 4-10, 2000.
Received Gordon Bell Prize Honorable Mention, Price/Performance category.
Preprints are available as
.ps 13MB
and
.pdf 4MB
versions for personal use only.
-
H. G. Dietz and T. I. Mattox,
"KLAT2's Flat Neighborhood Network,"
in the Proceedings of the Extreme Linux track of the 4th Annual Linux Showcase
(ALS2000),
Atlanta, GA. USA. October 12, 2000.
(.pdf)
-
H. G. Dietz and T. I. Mattox,
"Compiler Techniques For Flat Neighborhood Networks,"
13th International Workshop on Languages and
Compilers for Parallel Computing 2000
(LCPC00),
IBM T.J. Watson Research Center, Yorktown Heights, New York, USA,
August 11, 2000.
(.pdf 1.6M,
.ps 5.4M)
-
H. Dietz and T. Mattox,
"Inside The KLAT2 Supercomputer: The Flat Neighborhood Network & 3DNow!",
Ars Technica, June 2000.
(
http://www.arstechnica.com/cpu/2q00/klat2/klat2-1.html)
-
H.G. Dietz, T.I. Mattox, and G. Krishnamurthy,
"The Aggregate Function API: It's Not Just For PAPERS Anymore," to appear in
1997 Workshop on Languages and Compilers for Parallel Computing,
University of Minnesota, Minneapolis, MN, August 1997.
(
.html,
.ps)
-
T.I. Mattox,
Synchronous Aggregate Communication Architecture for MIMD Parallel
Processing,
Master's Thesis, School of Electrical and Computer Engineering,
Purdue University, August 1997.
(all.pdf,
all.ps,
body.ps,
appendix.ps)
-
H.G. Dietz and T.I. Mattox, "Managing Polyatomic
Coherence and Races with Replicated Shared Memory," to
appear in the special issue on DSM (distributed shared
memory) and related issues, IEEE Computer Society
Technical Committee on Computer Architecture (TCCA) Newsletter,
pp. 53-58, March 1997. (.pdf)
-
R. Hoare, T.I. Mattox, and H. Dietz,
"TTL-PAPERS 960801: The Modularly Scalable, Field Upgradable, Implementation of Purdue's Adapter for Parallel Execution and Rapid Synchronization,"
Tech Report
http://aggregate.org/AFN/960801/Index.html
-
R. Hoare, H. Dietz, T. Mattox, and S. Kim, "Bitwise
Aggregate Networks," In Proceedings of The Eighth IEEE
Symposium on Parallel and Distributed Processing (SPDP'96),
New Orleans, Louisiana, October 1996.
(.ps)
-
H. G. Dietz, R. Hoare, and T. Mattox, "A Fine-Grain
Parallel Architecture Based On Barrier Synchronization,"
Proceedings of the 1996 International Conference on Parallel
Processing, vol. I, pp. 247-250, Bloomington, Illinois,
August 1996.
(.ps)
-
Henry G. Dietz, T. M. Chung, and Timothy I. Mattox.
"A parallel processing support library based on synchronized
aggregate communication,"
In C.-H. Huang, P. Sadayappan, U. Banerjee, D. Gelernter, A. Nicolau,
and D. Padua, editors,
Languages and Compilers for Parallel Computing, 8th International Workshop (LCPC'95), volume 1033 of
Lecture Notes in Computer Science, pages 254-268,
Columbus, OH, USA, 1996. Springer-Verlag.
(.html,
.ps)
-
H.G. Dietz, T.M. Chung, T. Mattox, and T. Muhammad,
"A synchronization and aggregate communication library for PAPERS clusters,"
Technical Report
http://aggregate.org/TechPub/TR19950131/tr950131.html,
School of Electrical Engineering, Purdue University,
West Lafayette, IN, January 1995.
-
H. G. Dietz, T. M. Chung, T. I. Mattox, and T. Muhammad,
"Purdue's Adapter for Parallel Execution and Rapid Synchronization:
The TTL_PAPERS Design,"
Technical Report
http://aggregate.org/TechPub/ICPP95/icpp95.html,
School of Electrical Engineering, Purdue University,
West Lafayette, IN, January 1995.
-
H.G. Dietz, T. Muhammad, and T. Mattox,
"TTL Implementation of Purdue's Adapter for Parallel Execution
and Rapid Synchronization,"
Technical Report
http://aggregate.org/TechPub/super4.pdf,
School of Electrical Engineering, Purdue University,
West Lafayette, IN, December 1994.
-
Henry G. Dietz, William E. Cohen, T. Muhammad, and Timothy I. Mattox,
"Compiler techniques for finegrain execution on workstation clusters using PAPERS,"
In K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D.A. Padua, editors,
Languages and Compilers for Parallel Computing, 7th International Workshop (LCPC'94),
volume 892 of Lecture Notes in Computer Science, pages 31-45, Ithaca, NY, 1995. Springer-Verlag.
(.ps)
-
H. G. Dietz, T. Muhammad, J. B. Sponaugle, and T. Mattox,
"PAPERS: Purdue's Adapter for Parallel Execution and Rapid
Synchronization,"
Purdue University School of Electrical Engineering,
Technical Report TR-EE 94-11, March 1994.
(.ps.Z)
Back to Tim's
home page.