Tim Mattox's Publications

As time permits I will collect electronic forms of my publications, and link them in here. In the interum, you can likely find them via google.

My Ph.D. Dissertation:

Exploiting Sparseness of Communication Patterns for the Design of Networks in Massively Parallel Supercomputers

Ph.D., Electrical and Computer Engineering, University of Kentucky, August 2006.

Abstract:

A limited set of Processing Element (PE) pairs in a parallel computer cover the internal communications of scalable parallel programs. We take advantage of this property using the concept of Sparse Flat Neighborhood Networks (Sparse FNNs). Sparse FNNs are network designs that provide single-switch latency and full wire bandwidth for each specified PE pair, despite using relatively few network interfaces per PE and switches that have far fewer ports than there are PEs. This dissertation discusses the design problem, runtime support, and working prototype (KASY0) for Sparse FNNs. KASY0 not only demonstrated the claimed properties, but also set world records for its price/performance and performance on a specific application.

Parallel supercomputers execute many portions of an application simultaneously. For scalable programs, the more PEs the system has, the greater the potential speedup. Portions executing on different PEs may be able to work independently for short periods, but the performance desired might not be achieved due to delays in communication between PEs. The set of PE pairs that will communicate often is both predictable and small relative to the number of possible PE pairings. This sparseness property can be exploited in the design and implementation of networks for massively parallel supercomputers.

The sparseness of communicating pairs is rooted in the fact that each of the human-designed communication patterns commonly used in parallel programs has the property that the number of communicating pairs grows relatively slowly as the number of PEs is increased. Additionally, the number of pairs in the union of all communication patterns used in a suite of parallel programs grows surprisingly slowly due to pair synergy: the same pair often appears in multiple communication patterns. Detailed analysis of communication patterns clearly shows that the number of PE pairs actually communicating is very sparse, although the structure of the sparseness can be complex.

Download the 2.7MB PDF locally or from the University of Kentucky. An official copy can be obtained via http://hdl.handle.net/10225/280

My Master's Thesis:

Synchronous Aggregate Communication Architecture for MIMD Parallel Processing

M.S.E.E., School of Electrical and Computer Engineering, Purdue University, August 1997.

Abstract:

A multitude of different parallel architectures have been proposed, and each works well for applications with the appropriate types of parallelism. However, to achieve the best possible speedup for a wide range of parallel applications, a parallel computer's hardware must be able to make effective use of most types of parallelism.

This thesis suggests that the most fundamental flaws of MIMD architecture can be corrected by the addition of a simple synchronous aggregate communication system. After reviewing the relationship between some basic architectural characteristics and the types of parallelism that they can support, a small set of synchronous aggregate communication operations are defined. The implementation and performance of the PAPERS1 prototype hardware, a very simple synchronous aggregate communication system for a MIMD cluster of PCs, is discussed in detail.

PDF and Postscript Level-2 versions:
PDF (500K, 130 pages), all.ps (656K, 130 pages), body.ps (503K, 76 pages), appendix.ps (161K, 54 pages).

My Publications (Check my CV for an up-to-date list):

A bibliography of publications I have done in conjunction with my academic advisor and others. Please note that some of these links are to compressed (.Z) postscript files. You will want to choose to save these files locally, and then uncompress them.
  1. Timothy E. Dowling, Mary E. Bradley, Edward Col—n, John Kramer, Raymond P. LeBeau, Grace C.H. Lee, Timothy I. Mattox, Raul Morales-Juber’as, Csaba J. Palotai, Vimal K. Parimi, and Adam P. Showman, "The EPIC Atmospheric Model with an Isentropic/Terrain-Following Hybrid Vertical Coordinate," Icarus, 182(1):259-273, May 2006. 376KB PDF.
  2. Timothy I. Mattox, Henry G. Dietz, and William R. Dieter, "Sparse Flat Neighborhood Networks (SFNNs): Scalable Guaranteed Pairwise Bandwidth and Unit Latency," in the Proceedings of the Fifth Workshop on Massively Parallel Processing (WMPP'05) held in conjunction with the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, CO, USA, April, 2005. Preprints are available as 182KB PDF and 1MB PS versions for personal use only. Slides from the talk are available as 106KB PDF and 140KB PPT versions.
  3. Th. Hauser, T.I. Mattox, R.P. LeBeau, H.G. Dietz and P.G. Huang, "Code Optimizations for Complex Microprocessors Applied to CFD Software," SIAM Journal on Scientific Computing, 25(4):1461-1477, 2004. (various)
  4. H.G. Dietz and T.I. Mattox, "Compiler Optimizations Using Data Compression To Decrease Address Reference Entropy," 15th Workshop on Languages and Compilers for Parallel Computing (LCPC2002), College Park, Maryland, USA, July 25-27, 2002. (.pdf 98KB)
  5. Thomas Hauser, Timothy I. Mattox, Raymond P. LeBeau, Henry G. Dietz and P. George Huang, "High-Cost CFD on a Low-Cost Cluster," Proceedings of the IEEE/ACM SC2000 conference, Dallas, Texas, USA, November 4-10, 2000. Received Gordon Bell Prize Honorable Mention, Price/Performance category. Preprints are available as .ps 13MB and .pdf 4MB versions for personal use only.
  6. H. G. Dietz and T. I. Mattox, "KLAT2's Flat Neighborhood Network," in the Proceedings of the Extreme Linux track of the 4th Annual Linux Showcase (ALS2000), Atlanta, GA. USA. October 12, 2000. (.pdf)
  7. H. G. Dietz and T. I. Mattox, "Compiler Techniques For Flat Neighborhood Networks," 13th International Workshop on Languages and Compilers for Parallel Computing 2000 (LCPC00), IBM T.J. Watson Research Center, Yorktown Heights, New York, USA, August 11, 2000. (.pdf 1.6M, .ps 5.4M)
  8. H. Dietz and T. Mattox, "Inside The KLAT2 Supercomputer: The Flat Neighborhood Network & 3DNow!", Ars Technica, June 2000. ( http://www.arstechnica.com/cpu/2q00/klat2/klat2-1.html)
  9. H.G. Dietz, T.I. Mattox, and G. Krishnamurthy, "The Aggregate Function API: It's Not Just For PAPERS Anymore," to appear in 1997 Workshop on Languages and Compilers for Parallel Computing, University of Minnesota, Minneapolis, MN, August 1997. ( .html, .ps)
  10. T.I. Mattox, Synchronous Aggregate Communication Architecture for MIMD Parallel Processing, Master's Thesis, School of Electrical and Computer Engineering, Purdue University, August 1997. (all.pdf, all.ps, body.ps, appendix.ps)
  11. H.G. Dietz and T.I. Mattox, "Managing Polyatomic Coherence and Races with Replicated Shared Memory," to appear in the special issue on DSM (distributed shared memory) and related issues, IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 53-58, March 1997. (.pdf)
  12. R. Hoare, T.I. Mattox, and H. Dietz, "TTL-PAPERS 960801: The Modularly Scalable, Field Upgradable, Implementation of Purdue's Adapter for Parallel Execution and Rapid Synchronization," Tech Report http://aggregate.org/AFN/960801/Index.html
  13. R. Hoare, H. Dietz, T. Mattox, and S. Kim, "Bitwise Aggregate Networks," In Proceedings of The Eighth IEEE Symposium on Parallel and Distributed Processing (SPDP'96), New Orleans, Louisiana, October 1996. (.ps)
  14. H. G. Dietz, R. Hoare, and T. Mattox, "A Fine-Grain Parallel Architecture Based On Barrier Synchronization," Proceedings of the 1996 International Conference on Parallel Processing, vol. I, pp. 247-250, Bloomington, Illinois, August 1996. (.ps)
  15. Henry G. Dietz, T. M. Chung, and Timothy I. Mattox. "A parallel processing support library based on synchronized aggregate communication," In C.-H. Huang, P. Sadayappan, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, 8th International Workshop (LCPC'95), volume 1033 of Lecture Notes in Computer Science, pages 254-268, Columbus, OH, USA, 1996. Springer-Verlag. (.html, .ps)
  16. H.G. Dietz, T.M. Chung, T. Mattox, and T. Muhammad, "A synchronization and aggregate communication library for PAPERS clusters," Technical Report http://aggregate.org/TechPub/TR19950131/tr950131.html, School of Electrical Engineering, Purdue University, West Lafayette, IN, January 1995.
  17. H. G. Dietz, T. M. Chung, T. I. Mattox, and T. Muhammad, "Purdue's Adapter for Parallel Execution and Rapid Synchronization: The TTL_PAPERS Design," Technical Report http://aggregate.org/TechPub/ICPP95/icpp95.html, School of Electrical Engineering, Purdue University, West Lafayette, IN, January 1995.
  18. H.G. Dietz, T. Muhammad, and T. Mattox, "TTL Implementation of Purdue's Adapter for Parallel Execution and Rapid Synchronization," Technical Report http://aggregate.org/TechPub/super4.pdf, School of Electrical Engineering, Purdue University, West Lafayette, IN, December 1994.
  19. Henry G. Dietz, William E. Cohen, T. Muhammad, and Timothy I. Mattox, "Compiler techniques for finegrain execution on workstation clusters using PAPERS," In K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D.A. Padua, editors, Languages and Compilers for Parallel Computing, 7th International Workshop (LCPC'94), volume 892 of Lecture Notes in Computer Science, pages 31-45, Ithaca, NY, 1995. Springer-Verlag. (.ps)
  20. H. G. Dietz, T. Muhammad, J. B. Sponaugle, and T. Mattox, "PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization," Purdue University School of Electrical Engineering, Technical Report TR-EE 94-11, March 1994. (.ps.Z)

Back to Tim's home page.