OSA's Digital Library

Journal of Optical Communications and Networking

Journal of Optical Communications and Networking

  • Editor: Richard A. Linke
  • Vol. 6, Iss. 5 — May. 1, 2007
  • pp: 465–481

nD-RAPID: a multidimensional scalable fault-tolerant optoelectronic interconnection for high-performance computing systems

Chander Kochar, Avinash Kodi, and Ahmed Louri  »View Author Affiliations


Journal of Optical Networking, Vol. 6, Issue 5, pp. 465-481 (2007)
http://dx.doi.org/10.1364/JON.6.000465


View Full Text Article

Acrobat PDF (2033 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

Feature Issue on Photonics in Switching

The increasing demand for bandwidth coupled with saturating electrical systems is leading the drive for optics as an interconnect technology. High-performance computing systems (HPCS) consist of a large number of components such as processors, memories, and interconnection links. As the number of components in HPCS increases, the probability of a failure also increases. Therefore, it becomes imperative that the system be fault tolerant to ensure high availability even in the presence of faults. We propose a multidimensional optoelectronic architecture, nD-RAPID (reconfigurable and scalable all-photonic interconnect for distributed and parallel systems), where n can be 1, 2, or 3. nD-RAPID provides high bandwidth, low latency, dynamic reconfiguration, and fault tolerance. While designing the fault-tolerant routing algorithm, we have tried to ensure that it provides optimum performance in the absence of faults, shows minimal degradation in the presence of faults, and can tolerate a reasonable number of faults. In the presence of faults the on-board switching mechanism dynamically reconfigures itself to reroute packets along nonfaulty links. Extensive simulation results are presented that compare nD-RAPID with other popular HPCS topologies.

© 2007 Optical Society of America

OCIS Codes
(200.0200) Optics in computing : Optics in computing
(200.4650) Optics in computing : Optical interconnects

ToC Category:
Photonics in Switching

History
Original Manuscript: October 2, 2006
Revised Manuscript: December 20, 2006
Manuscript Accepted: February 28, 2007
Published: April 23, 2007

Virtual Issues
Photonics in Switching (2006) Journal of Optical Networking

Citation
Chander Kochar, Avinash Kodi, and Ahmed Louri, "nD-RAPID: a multidimensional scalable fault-tolerant optoelectronic interconnection for high-performance computing systems," J. Opt. Netw. 6, 465-481 (2007)
http://www.opticsinfobase.org/jocn/abstract.cfm?URI=jon-6-5-465


Sort:  Author  |  Year  |  Journal  |  Reset

References

  1. B. E. Lemoff, M. E. Ali, G. Panotopoulos, G. M. Flower, B. Madhavan, A. F. J. Levi, and D. W. Dolfi, 'MAUI: enabling fiber-to-the processor with parallel multiwavelength optical interconnects,' J. Lightwave Technol. 22, 2043-2054 (2004). [CrossRef]
  2. D. E. Culler, J. P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach (Morgan Kaufmann, 1999).
  3. D. Huang, T. Sze, A. Landin, R. Lytel, and H. L. Davidson, 'Optical interconnects: out of the box forever?,' IEEE J. Sel. Top. Quantum Electron. 9, 614-623 (2003). [CrossRef]
  4. E. Mohammed, A. Alduino, T. Thomas, H. Braunisch, D. Lu, J. Heck, A. Liu, I. Young, B. Barnett, G. Vandenton, and R. Mooney, 'Optical interconnect system integration for ultra-short-reach applications,' Intel Technol. J. 8, 114-127 (2004).
  5. A. F. Benner, M. Ignatowski, J. A. Kash, D. M. Kuchta, and M. B. Ritter, 'Exploitation of optical interconnects in future server architectures,' IBM J. Res. Dev. 49, 755-775 (2005).
  6. D. A. B. Miller, 'Rationale and challenges for optical interconnects to electronic chips,' Proc. IEEE 88, 728-749 (2000). [CrossRef]
  7. J. H. Collet, D. Litaize, J. V. Campenhout, C. Jesshope, M. Desmulliez, H. Thienpont, J. Goodman, and A. Louri, 'Architectural approaches to the role of optics in monoprocessor and multiprocessor machines,' Appl. Opt. 39, 671-682 (2000).
  8. A. K. Kodi and A. Louri, 'Rapid for high-performance computing systems: architecture and performance evaluation,' Appl. Opt. 45, 6326-6334 (2006). [CrossRef]
  9. http://www.top500.org/.
  10. W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks (Morgan Kaufmann, 2004).
  11. A. A. Chen and J. H. Kim, 'Planar-adaptive routing: low cost adaptive networks for multiprocessors,' J. ACM 42, 91-123 (1995).
  12. T. M. Pinkston, R. Pang, and J. Duato, 'Deadlock-free dynamic reconfiguration schemes for increased network dependability,' IEEE Trans. Parallel Distrib. Syst. 14, 780-794 (2003).
  13. Infiniband Trade Association, http://www.infinibandta.com.
  14. F. Petrini, W. Feng, A. Hoisie, S. Coll, and E. Frachtenberg, 'The Quadrics Network (QsNet): high performance clustering technology,' in Proceedings of the 9th IEEE Hot Interconnects (IEEE, 2001), pp. 125-130.
  15. J. C. Sancho, A. Robles, and J. Duato, 'A flexible routing scheme for networks of workstations,' Proceedings of the International Conference on High Performance Computing (Springer, 2000), pp. 260-267.
  16. M. S. Chen and K. G. Shin, 'Adaptive fault tolerant routing in hypercube multicomputers,' IEEE Trans. Comput. 39, 1406-1416 (1990). [CrossRef]
  17. M. S. Chen and K. G. Shin, 'Depth-first search approach for fault-tolerant routing in hypercube multicomputers,' IEEE Trans. Parallel Distrib. Syst. 1, 152-159 (1990). [CrossRef]
  18. Y. M. Boura and C. R. Das, 'Fault-tolerant routing in mesh networks,' in Proceedings of the 1995 International Conference on Parallel Processing, Vol. 1, pp. 106-109, 1995.
  19. M. E. Gomez, N. A. Nordbotten, J. Flich, P. Lopez, A. Robles, J. Duato, T. Skeie, and O. Lysne, 'A routing methodology for achieving fault tolerance in direct networks,' IEEE Trans. Comput. 4, 400-415 (2006).
  20. R. V. Boppana and S. Chalsani, 'Faul tolerant wormhole routing algorithms for mesh networks,' IEEE Trans. Comput. 44, 848-864 (1995). [CrossRef]
  21. A. Louri and H. Sung, 'An optical multi-mesh hypercube: a scalable optical interconnection network for massively parallel computing,' J. Lightwave Technol. 12, 704-716 (1994).
  22. A. Louri and B. Weech, 'A spanning multichannel linked hypercube: a gradually scalable optical interconnection network for massively parallel computing,' IEEE Trans. Parallel Distrib. Syst. 9, 497-511 (1998).
  23. R. D. Chamberlain, M. A. Franklin, and C. S. Baw, 'Gemini: an optical interconnection network for parallel processing,' IEEE Trans. Parallel Distrib. Syst. 13, 1038-1055 (2002).
  24. K. Barker, A. Benner, R. Hoare, A. Hoisie, A. K. Jones, D. J. Kerbyson, D. Li, R. Melham, R. Rajamony, E. Schenfeld, S. Shao, C. Stunkel, and P. Walker, 'On the feasibility of optical circuit switching for high performance computing systems,' Super Computing Conference SC'05, November 2005.
  25. S. Banerjee and D. Sarkar, 'Hypercube connected rings: a scalable fault tolerant logical topology for optical networks,' Tech report, Dept. of ECE, (University of Miami, Florida, 1994).
  26. P. Lalwaney and I. Koren, 'Fault-tolerant schemes for WDM-based multiprocessor networks,' in Proceedings of the 2nd Workshop on Massively Parallel Processing using Optical Interconnects (MMPOI'95), (IEEE, 1995), pp. 90-97.
  27. Y. Yang and J. Wang, 'A fault-tolerant rearrangeable permutation network,' IEEE Trans. Comput. 53, 414-426 (2004).
  28. B. Helvik and R. Andreassen, 'Fault tolerance in optical networks; a study of electronic in- and egress interconnections in torus topologies,' in Proceedings of the 9th Conference on Optical Network Design and Modelling, ONDM (IEEE, 2005).
  29. S. S. Mukherjee, P. Bannon, S. Lang, A. Spink, and D. Webb, 'The alpha 21364 network architecture,' IEEE Micro 22, 26-35 (2002).
  30. M. Galles, 'Spider: a high-speed network interconnect,' IEEE Micro 17, 34-39 (1997).
  31. Mellanox Technologies, http://www.mellanox.com/
  32. F. Petrini, E. Frachtenberg, A. Hoisie, and S. Coll, 'Performance evaluation of the quadrics interconnection network,' J. Cluster Computing 6, 125-142 (2003).
  33. A. Singh, W. J. Dally, A. Gupta, and B. Towles, 'GOAL: a load balanced adaptive routing algorithm for torus networks,' in Proceedings of the 30th Annual International Symposium on Computer Architecture (ACM, 2003), 194-205.
  34. A. Singh, W. J. Dally, B. Towles, and A. K. Gupta, 'Globally adaptive load-balanced routing in tori,' Comput. Arch. Lett. 3, (2004).
  35. Y. Qian, A. Afsahi, N. R. Fredrickson, and R. Zamani, 'Performance evaluation of the sun fire link SMP clusters,' in Proceedings of the the 18th International Symposium on High Performance Computing Systems and Applications (IEEE, 2004), pp. 145-156.

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited