What's in NPACI Rocks and How Do I Use It?

Steven K. Baum
Texas A&M University
Department of Oceanography
College Station, TX 77843-3146
(979) 548-3274
baum@stommel.tamu.edu

Mar 23, 2004

Contents

1  Introduction
    1.1  Overview
        1.1.1  Tools
    1.2  Components
2  Hints and Hacks
    2.1  Mailing List Excerpts
        2.1.1  Changing Hostnames
        2.1.2  MySQL and ROCKS
        2.1.3  NIC Cards
        2.1.4  Custom Partioning
        2.1.5  Hyperthreading or Not
        2.1.6  Intel Compiler Migration to Nodes
3  NPACI Rocks Base/HPC Roll
    3.1  PBS
    3.2  MPICH
        3.2.1  MPD
    3.3  Cluster-Fork
    3.4  PVM
    3.5  HPL
    3.6  Ganglia
    3.7  411 Secure Information Service
    3.8  phpMyADmin
    3.9  KickStart
    3.10  MySQL
    3.11  PVFS
4  SGE Roll - Grid Engine
5  Grid Roll - NMI
    5.1  NMI Client/Server
    5.2  Globus Toolkit
    5.3  Condor-G
    5.4  Network Weather Service
    5.5  KX.509/KCA
    5.6  GSI OpenSSH
    5.7  MyProxy
    5.8  MPICH-G2
    5.9  Grid Packaging Tools
    5.10  Gridconfig Tools
    5.11  Pubcookie
    5.12  Shibboleth
    5.13  OpenSAML
    5.14  CPM
    5.15  KX.509 and KA
    5.16  PERMIS
    5.17  Look
    5.18  LDAP Analyzer
    5.19  Certificate Profile Registry
    5.20  eduPerson
    5.21  eduOrg
    5.22  commObject/H.350
    5.23  Practices in Directory Groups
    5.24  LDAP Recipe
    5.25  Metadirectories Best Practices
    5.26  Enterprise Directory Implementation Roadmap
    5.27  Shibboleth Architecture
    5.28  SAGE
6  Intel Roll
7  Globus Toolkit
    7.1  OGSI
    7.2  GSI - Security Infrastructure
    7.3  System Level Services
    7.4  GridFTP
    7.5  RFT
    7.6  RLS
    7.7  GRAM
    7.8  MDS
8  MPI-Enabled Packages
    8.1  ADAPTOR
    8.2  AMMPI
    8.3  APPSPACK
    8.4  ARPS
    8.5  Aztec
    8.6  BEARCLAW
    8.7  BLACS
    8.8  BLZPACK
    8.9  Cactus
    8.10  CAM
    8.11  Chombo
    8.12  CLAWPACK
    8.13  CTSim
    8.14  DAGH
    8.15  Dakota
    8.16  DPMTA
    8.17  EGO
    8.18  EPIC
    8.19  FermiQCD
    8.20  GADGET
    8.21  GASNet
    8.22  GFS
    8.23  GS2
    8.24  HDF
    8.25  HYCOM
    8.26  HYPRE
    8.27  iMOOSE
    8.28  ISIS++
    8.29  ISTV
    8.30  LAMMPS
    8.31  LFC
    8.32  libMesh
    8.33  LMPI
    8.34  LOCA
    8.35  MADCOW
    8.36  magpar
    8.37  MARMOT
    8.38  MDP
    8.39  MGRIDGEN
    8.40  MITgcm
    8.41  MM5
    8.42  MOUSE
    8.43  mpiBLAST
    8.44  MPB
    8.45  mpiP
    8.46  MPP
    8.47  MUMPS
    8.48  NaSt3DGP
    8.49  NetPIPE
    8.50  OPT++
    8.51  Overture
    8.52  PALM
    8.53  PARAMESH
    8.54  ParaSol
    8.55  ParMETIS
    8.56  pARMS
    8.57  PARPACK
    8.58  ParVox
    8.59  PaStiX
    8.60  PETSc
        8.60.1  PETSc Applications
    8.61  PHAML
    8.62  PIKAIA
    8.63  PLANSO
    8.64  PLASIM
    8.65  PMESA
    8.66  POP
    8.67  PPAT
    8.68  Prometheus
    8.69  PSPACES
    8.70  PUMA
    8.71  QCDimMPI
    8.72  RAMS
    8.73  RSL
    8.74  SAMRAI
    8.75  ScaLAPACK
    8.76  SDPARA
    8.77  SGOPT
    8.78  SLEPc
    8.79  SMS
    8.80  Snark
    8.81  SPAI
    8.82  Sphinx
    8.83  S+
    8.84  SPOOLES
    8.85  SUNDIALS
    8.86  SuperLU
    8.87  SWOM
    8.88  Towhee
    8.89  Trilinos
        8.89.1  AztecOO
        8.89.2  Epetra
        8.89.3  IFPACK
        8.89.4  ML
        8.89.5  TriUtils
    8.90  TRLan
    8.91  UG
    8.92  UPC
    8.93  WAVEWATCH
    8.94  WRF
    8.95  WSMP
    8.96  Zoltan
    8.97  ZPL
9  Miscellaneous Odd Jobs
    9.1  RPM Source Packages
    9.2  Flashing the BIOS
10  Miscellaneous Documentation
    10.1  Anaconda
        10.1.1  Overview
        10.1.2  Install Mechanism Summary
        10.1.3  Patching the Installer
        10.1.4  Invocation Options
        10.1.5  Further Information
    10.2  Kickstart
        10.2.1  Introduction
        10.2.2  Kickstart Options
        10.2.3  Package Selection
        10.2.4  Pre-installation Script
        10.2.5  Post-installation Script
        10.2.6  Making the Kickstart File Available

Chapter 1
Introduction

I cobbled together this document because I wanted something that wasn't available, i.e. a summary of all the packages contained within the NPACI Rocks main distribution and the various Rolls available as supplements. More specifically, I wanted a one-stop solution to the problem of figuring out what exactly is available and how I might use it. I've also included a nascent "Hints and Hacks" section wherein I've picked out some posts on the mailing list that have been specifically helpful to me, and thrown a bit of formatting at them. Chapter 1 provides a brief introduction to NPACI Rocks. Chapter 2 (1.2) offers some hints and hacks as to various problems that might be encountered.
Any and all accusations of originality will be vigorously denied. Enjoy.
By the way, we're mostly using our wee Rocks cluster to run ROMS, the Regional Ocean Model System.

1.1  Overview

Source:
http://www.rocksclusters.org/Rocks/

1.1.1  Tools

The NPACI distribution is installed and maintained with the help of some tools.

Active node configuration management

Nodes are installed using the RedHat kickstart tool which is driven by a text-based configuration file. This file contains all the package names to install as well as post-processing commands. In Rocks the kickstart files are dynamic, i.e. they are actively managed by building them on-the-fly with a CGI script. The script's functions are:
There are two types of XML-based configuration files: The roots of the graph represent "appliances" such as compute and frontend. This XML-based installation infrastructure describes all node behaviors.
The installation procedure involves the following steps: This method is very flexible, allowing heterogenous hardware to be supported as easily as homogeneous hardware.

Distribution management with rocks-dist

Documentation:
Bibliography
http://www.rocksclusters.org/rocks-documentation/3.1.0/bibliography.html

1.2  Components

Chapter 2
Hints and Hacks

NPACI FAQ
http://www.rocksclusters.org/rocks-documentation/3.1.0/faq.html
NPACI Discussion List Archives
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/

2.1  Mailing List Excerpts

Useful discussion list excerpts include:

2.1.1  Changing Hostnames

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-February/004542.html

We are seeing more frequent user problems due to people changing their hostnames w/o understanding how critical the hostname is to a machine and all of the configured services. This is understandable since hostnames are not very critical on most computers (desktops), unfortunately the frontend of a cluster is not like most computers. I'm going to add something about this to our usersguide but I'm going to make the point here first (mainly for the list archives).

Please do not change hostnames after installation.

Changing the hostname of the frontend machine in your cluster is like changing the name of a 10 year old dog. If you change your dog's name do not expect him to answer when you call. This is exactly what happens on your frontend machine. There are several cluster critical services that all key off of the frontend hostname, and due to the nature of UNIX this hostname is peppered through over a dozen configuration files on your frontend and compute nodes. While it is reasonably safe to changing the name of a desktop system, this is not the case for a server that controls your entire cluster. It has taken us several years of rocks development to converge on a naming scheme that works for all cluster/grid software (SGE, PBS, Globus, CAs, DNS, NFS, NIS/411, ...). While this naming scheme may not be preferred by some users, diverging from it has a very high risk of breaking your cluster. The first step on diagnosing any issues with Rocks is to make sure no changes have been made to hostnames.

2.1.2  MySQL and ROCKS

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-February/004520.html

Q: How does one configure a password to restrict editing the MySQL database contents to root only?

A: The mysql database is password protected by default. If you use the web interface (phpMyAdmin), it operates as the user "apache" that has limited access to the Cluster database without a password (basic modification rights on data, no rights to change table/schema structure).

However, Apache restricts access to phpMyAdmin site to inside the cluster only by default. If you try to connect to the database as a normal user, you will be denied.

Q: Can the Rocks MySQL database be restored intact from a binary copy of the database files = in use?

A: As for the backups, the safest way is to make periodic backups with
# mysqldump --opt cluster > cluster.sql

This will create a plain text file with SQL commands to completely restore the state of your database. If you make a cron job to do this every few hours, your nightly backup should pick it up just fine.

2.1.3  NIC Cards

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-February/004514.html

Q: I have installed the front-end completely and it works. I have intel express 1000 nic cards and will install fine. I also have onboard 3c940 gig on my Asus p4p8x mobo. What do I need to do so that I can use the onboard nic card instead of the intel. (I don't want to have to buy intel card for all my nodes and the onboard is not recognized by rocks)? I was told before that the new beta version of rocks 3.1.1 would recognize it but it did not work. I created a driver disk and it still would not recognize 3c940 card.

A: I had this problem with a similar unit for a customer under RH9. The driver worked after recompiling.

What would need to be done here is to place that driver, compiled for the same kernel that is used in the booting environment, into the boot environment. As I have discovered by trying to do this in the past, it is a non-trivial undertaking. I had not been able to make it work.

If you can create a real device driver disk, you might be able to use it with the CDROM install. This would mean 2 disks (a floppy and a CDROM), but it is better than not working. I was able to make this work for one unit I tried, though building driver disks is also not fun.

If Greg or any of the ROCKS folk can tell you the name of the "default" booted kernel (I think it is simply "linux"), you should be able to use
	linux dd

to boot the unit and load the device driver. [Note: The default kernel name on the ROCKS CD is internal, so replace linux with internal here.]

The time required to build a device driver disk may be somewhat large though. Compare the costs of 16 of the e1000 cards to several hours of your time. If the number of hours is over 8 or so, you might find it more economical to buy the cards than to build the drivers for them.

See http://people.redhat.com/dledford/README.mod_devel_kit.html and google other locations for instructions.

2.1.4  Custom Partitioning

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-February/004509.html

2.1.5  Hyperthreading or Not

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-February/004458.html

Q: I wanted to see if I could get opinions on whether or not to use hyperthreading on nodes of a cluster generally used for batch computing.

A: Regardless, hyperthreading is a double edged sword. Unless the linux process scheduler is 100HT turned on (consider the case with 4 virtual cpus, and 2 processes running). In our experience (2.4 kernels up to 2.6.2-rc2) the scheduler has never worked well enough to outweigh the large slowdowns if 2 tasks ever get put onto one physical cpu.

Run a few benchmarks on tasks you ae interested in and see if HT is actually a win for you. It seems interrupts are mapped nicely only to physical cpus in later 2.6's which is an improvement over the 2.4 HT kernels.

Addendum: You can be very precise about process<->processor assignments and scheduling by using the cpumemsets kernel patches (http://oss.sgi.com/projects/cpumemsets), and the associated utilities. SGI uses this on the Altix to get meaningful application scaling.

2.1.6  Intel Compiler Migration to Nodes

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-January/004422.html

Q: I noticed the Intel C/C++ and Fortran compilers are installed on the front-end after my install. How do I migrate them to the compute nodes?

A: The design thinking is to develop on the frontend, and run on the compute nodes. To put the compilers on the nodes, edit:
/home/install/profiles/current/graphs/default/intel.xml

and add this:
<edge from=3D"compute">
         <to>intel</to>
 </edge>

right below this:
<edge from=3D"frontend">
         <to>intel</to>
 </edge>

and then reinstall all the computer nodes with:
ssh-add
cluster-fork /boot/kickstart/cluster-kickstart

Comment: Unless you're planning to statically link everything, then you need the libraries on all the nodes. And doing your own packaging of the Intel compiler libs is a pain.

The solution is to install the compilers on all the nodes, but only put the license on the frontend. This allows you to build on the frontend, but run across the cluster.

Comment: I find it easier to install programs into /opt/apps or /opt/programs and export to the rest of the cluster (fixed mount point, not automounted). Then simply set the ld.so.conf on each node to include the /opt/apps/strange_app_1/lib as needed, and rerun ldconfig. Not sure how easy this is with extend-compute,xml. As you might guess I do this another way (very easy).

Comment: Just want to reiterate an architectural tenet for Rocks - We have endeavored to design things so that shared NFS is not necessary for proper functioning. With the exception of user home areas, our default configuration adheres to this.

Installation of the compiler/libs onto nodes is trivial.

The other option of a fixed NFS mount is similarly trivial. The following is a complete extend-compute.xml file that accomplishes this:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE kickstart SYSTEM "@KICKSTART_DTD@">
<kickstart>
<post>
<file name="/etc/fstab" mode="append">
<var name="Kickstart_PrivateHostname"/>:/path/to/export /path/to/mount 
nfs defaults 0 0
</file>
</post>
</kickstart>

You can run (in /home/install) ./kickstart.cgi -client=compute-0-0 to get the kickstart file that will be created for your node. If you inspect this output, then you will see the following that has been generated:
(
cat >> /etc/fstab << 'EOF'
slic00:/path/to/export /path/to/mount nfs defaults 0 0
EOF
)

The important part are the 2 lines before the EOF. They append the line to the fstab. The name of the local host has been defereferenced in the "var" statement (in this case to slic00). You will also notice (not posted here) some RCS manipulations. This allows multiple xml files to manipulate a config file and adhere to each others changes.

Chapter 3
NPACI Rocks Base/HPC Roll

NPACI Info:
http://www.rocksclusters.org/rocks-documentation/3.1.0/
Kickstart Nodes

  1. 411
  2. 411-client
  3. 411-server
  4. apache
  5. autofs
  6. autofs-client
  7. autofs-server
  8. base
  9. c-development - Minimalist C development support, i.e. everything needed to compile the kernel.
  10. cdr - CDR tools for burning, ripping, encoding, etc.
  11. client - A file used as a connection point for other XML configuration tools.
  12. cluster-db - Cluster database.
  13. cluster-db-data - Populate cluster database with initial data.
  14. cluster-db-structure - Cluster database SQL table structure.
  15. devel - A file used as a connection point for other XML configuration nodes.
  16. dhcp-server - Set up the DHCP server for the cluster.
  17. disk-stamp - Obtain a root partition.
  18. dns-server - Configures a DNS nameserver for the cluster on the front end.
  19. elilo - IA-64 bootloader support.
  20. emacs - Emacs editor.
  21. fortran-development
  22. fstab - Examine the disks and see if there are existing, non-root partitions that should be preserved.
  23. grub - IA-32 bootloader support.
  24. install - Everything needed to kickstart the compute nodes.
  25. installclass - The base installclass files.
  26. installclass-client
  27. installclass-server
  28. ip-diag - TCP/IP network diagnostic tools.
  29. keyboard - USB keyboard support for IA-64.
  30. lilo - IA-32 bootloader support.
  31. logrotate - Append rules to logrotate to prune files in /var/log.
  32. media-server - Root for kickstart files on the CD/DVD.
  33. nis - Private side NIS.
  34. nis-client
  35. nis-server
  36. node - A machine in the cluster.
  37. node-thin - For turning off packages non-essential for parallel applications.
  38. nsswitch-files - UNIX files for all lookups.
  39. nsswitch-nis - UNIX files for NIS lookups.
  40. ntp - Network Time Protocol.
  41. ntp-client
  42. ntp-server
  43. perl-development
  44. python-development
  45. rocks-dist
  46. rpc
  47. scripting
  48. server
  49. ssh
  50. ssl
  51. syslog
  52. syslog-client
  53. syslog-server
  54. tcl-development
  55. x11
  56. x11-thin
Components:
The components of the NPACI Rocks base distribution include:
Further details of these components are presented in the following sections.

3.1  PBS

Source:
http://www.openpbs.org/
OpenPBS is the original version of the Portable Batch System. It is a flexible batch queueing system developed for NASA in the early to mid-1990s. It operates on networked, multi-platform UNIX environments.

3.2  MPICH

Source:
http://www-unix.mcs.anl.gov/mpi/mpich/

Documentation:
http://www-unix.mcs.anl.gov/mpi/mpich/docs.html
The available MPICH commands are:

3.2.1  MPD

NPACI Info:
http://www.rocksclusters.org/rocks-documentation/3.1.0/mpd.html
MPD is a new high-performance job launcher developed by Argonne National Laboratory, the makers of MPICH. It serves as a drop-in replacement to mpirun, and can be used to launch parallel jobs. MPD can start both MPI and non-MPI parallel applications.

Documentation:
MPICH User Guide
http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4mpd/node53.htm

3.3  Cluster-Fork

Source:
http://www.rocksclusters.org/rocks-documentation/3.1.0/launching-interactive-jobs.html#CLUSTER-FORK
Often we want to execute parallel jobs consisting of standard UNIX commands. By "parallel" we mean that the same command runs on multiple nodes in the cluster. We use this type of job to move files, run small tests, and to perform various administrative tasks. Rocks provides a simple tool for this purpose called cluster-fork.

3.4  PVM

Source:
http://www.csm.ornl.gov/pvm/pvm_home.html
PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of Unix and/or Windows computers hooked together by a network to be used as a single large parallel computer. Thus large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. The software is very portable. The source, which is available free thru netlib, has been compiled on everything from laptops to CRAYs.
PVM enables users to exploit their existing computer hardware to solve much larger problems at minimal additional cost. Hundreds of sites around the world are using PVM to solve important scientific, industrial, and medical problems in addition to PVM's use as an educational tool to teach parallel programming. With tens of thousands of users, PVM has become the de facto standard for distributed computing world-wide.

Documentation:
FAQ
http://www.netlib.org/pvm3/faq_html/faq.html
PVM Book
http://www.netlib.org/pvm3/book/pvm-book.html

3.5  HPL

Source:
http://www.netlib.org/benchmark/hpl/
HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.

Documentation:
HPL Functions:
http://www.netlib.org/benchmark/hpl/documentation.html
FAQ:
http://www.netlib.org/benchmark/hpl/faqs.html
Tuning: http://www.netlib.org/benchmark/hpl/tuning.html

3.6  Ganglia

Source:
http://ganglia.sourceforge.net/
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency.

Documentation:
http://www.phpmyadmin.net/documentation/

3.7  411 Secure Information Service

Source:
http://www.rocksclusters.org/rocks-documentation/3.1.0/service-411.html
The 411 Secure Information Service provides NIS-like functionality for Rocks clusters. It is named after the common "411" code for information in the phone system. We use 411 to securely distribute password files, user and group configuration files and the like.
411 uses Public Key Cryptography to protect files' contents. It operates on a file level, rather than the RPC-based per-line maps of NIS. 411 does not rely on RPC, and instead distributes the files themselves using HTTP (web service). Its central task is to securely maintain critical login/password files on the worker nodes of a cluster. It does this by implementing a file-based distributed database with weak consistency semantics. The design goals of 411 include scalablility, security, low-latency when changes occur, and resiliance to failures.

Documentation:
411 for Certain:
If you see that a user or passwd-related change on the frontend not propagating to all the nodes, it could be that the 411 multicast alert packet got lost along the way. Although 411 is designed for this eventuality, it may be up to a day before all the nodes are in sync.
If you suspect that your 411 system is lagging, please run the following commands on the frontend node.
# make -C /var/411
# cluster-fork "411get --all"

This command set guarantees that all your compute nodes will successfully retrieve the login-file changes made on the frontend.

3.8  phpMyAdmin

Source:
http://www.phpmyadmin.net/
phpMyAdmin is a tool written in PHP intended to handle the administration of MySQL over the WWW. Currently it can create and drop databases, create/drop/alter tables, delete/edit/add fields, execute any SQL statement, manage keys on fields, manage privileges,export data into various formats and is available in 47 languages.

Documentation:
http://www.phpmyadmin.net/documentation/

3.9  KickStart

Source:
http://wwwcache.ja.net/dev/kickstart/
One of the key ingredients of Rocks is a robust mechanism to produce customized distributions (with security patches pre-applied) that define the complete set of software for a particular node. A cluster may require several node types including compute nodes, frontend nodes file servers, and monitoring nodes. Each of these roles requires a specialized software set. Within a distribution, different node types are defined with a machine specific Red Hat Kickstart file, made from a Rocks Kickstart Graph.
A Kickstart file is a text-based description of all the software packages and software configuration to be deployed on a node. The Rocks Kickstart Graph is an XML-based tree structure used to define RedHat Kickstart files. By using a graph, Rocks can efficiently define node types without duplicating shared components. Similiar to mammalian species sharing 80% of their genes, Rocks node types share much of their software set. The Rocks Kickstart Graph easily defines the differences between node types without duplicating the description of their similarities. See the Bibliography section for papers that describe the design of this structure in more depth.
By leveraging this installation technology, we can abstract out many of the hardware differences and allow the Kickstart process to autodetect the correct hardware modules to load (e.g., disk subsystem type: SCSI, IDE, integrated RAID adapter; Ethernet interfaces; and high-speed network interfaces). Further, we benefit from the robust and rich support that commercial Linux distributions must have to be viable in today's rapidly advancing marketplace.

3.10  MySQL

Source:
http://www.mysql.com/
The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

3.11  PVFS

Source:
http://www.parl.clemson.edu/pvfs/
The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs.
PVFS supports the UNIX I/O interface and allows existing UNIX I/O programs to use PVFS files without recompiling. The familiar UNIX file tools (ls, cp, rm, etc.) will all operate on PVFS files and directories as well. This is accomplished via a Linux kernel module which is provided as a separate package.
PVFS stripes file data across multiple disks in different nodes in a cluster. By spreading out file data in this manner, larger files can be created, potential bandwidth is increased, and network bottlenecks are minimized. A 64-bit interface is implemented as well, allowing large (more than 2GB) files to be created and accessed.
Multiple user interfaces are available including: Documentation:
Quick Start Guide to PVFS
http://www.parl.clemson.edu/pvfs/quick.html
Using the Parallel Virtual File System
http://www.parl.clemson.edu/pvfs/user-guide.html
PVFS FAQ
http://www.parl.clemson.edu/pvfs/pvfs-faq.html

Chapter 4
SGE Roll - Grid Engine

NPACI Info:
http://www.rocksclusters.org/roll-documentation/sge/3.1.0/
The SGE Roll installs and configures the SUN Grid Engine scheduler.

Source
http://gridengine.sunsource.net/
The Grid Engine project is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems and hosted by CollabNet, the Grid Engine project provides enabling distributed resource management software for wide ranging requirements from compute farms to grid computing.

Documentation:
Grid Engine Man Pages
http://gridengine.sunsource.net/unbranded-source/browse/ checkout /gridengine/doc/htmlman/index.html?content-type=text/html

Chapter 5
Grid Roll - NMI

NPACI Info:
http://www.rocksclusters.org/roll-documentation/grid/3.1.0/
The Rocks Grid Roll uses the NSF Middleware Initiative (NMI) Release 3.1 to provide Globus connectivity for Rocks clusters. NMI R3.1 is a bundling of Globus, Condor, NWS, MDS, and other grid middleware into a single distribution. NMI uses the Globus Packaging Toolkit (GPT) to manage software packages. The Rocks Grid Roll builds on the very good work by the NMI team, to seamlessly install the de facto standard grid middleware on Rocks Clusters.

Source:
http://www.nsf-middleware.org/
The National Science Foundation Middleware Initiative (NMI) addresses a critical need for software infrastructure to support scientific and engineering research. Begun in late 2001, NMI funds the design, development, testing, and deployment of middleware, a key enabling technology upon which customized applications are built. Specialized NMI teams are defining open-source, open-architecture standards that are creating important new avenues of on-line collaboration and resource sharing. In addition to the production-quality software and implementation standards created by those large systems-integration teams, NMI funds smaller projects that focus on experimental middleware applications.
The GRIDS Center Software Suite includes the Globus Toolkit, Condor-G, Network Weather Service, Gridconfig Tools, and GSI-Enabled Open SSH. New, with this release, from the GRIDS Center are MPICH-G2 and MyProxy. Components from the NMI-EDIT team include KX.509/KCA, which is bundled in the GRIDS Center Software Suite, KX.509/KCA Stand-Alone (for use outside the Globus environment), Shibboleth 1.0, OpenSAML 1.0, Pubcookie 3.0, and CPM 1.1. New, with this release, from the NMI-EDIT team include PERMIS, Look, Sage, and Enterprise Directory Implementation Roadmap. Some of these components were contributed by participants in the Internet2 Middleware Initiative. NMI-EDIT Software supports a number of common platforms.

Component Hierarchy:
The components in the NMI suite are classified in the hierarchy shown below. Each component is briefly explained in a succeeding section.

5.1  NMI Client/Server

Source:
http://www.nsf-middleware.org/NMIR3/components/nmi.asp
The NMI Client and Server Bundles are an aggregate of all of the software components in the Grids Center Software Suite. This integrated package can help make installation and configuration easier for those who want to implement all or most of the technologies in this set.
Documentation:
NMI Client/Server Bundles
http://www.nsf-middleware.org/documentation/NMI-R3/0/All/index.htm

5.2  Globus Toolkit

Source:
http://www.globus.org/
The de facto standard for Grid computing, the Globus Toolkit is an open-source collection of modular "bag of technologies" that simplifies collaboration across dynamic, multi-institutional virtual organizations. It includes tools for authentication, scheduling, file transfer and resource description.
The Globus Project is a partnership of Argonne National Laboratory, the University of Southern California Information Sciences Institute, and the University of Chicago. Since its 1996 inception, the project has been dedicated to the open-source philosophy of sharing resources to maximize progress and community benefits. The toolkit features software services and libraries for resource monitoring, discovery, and management, plus security and file management. It is now central to science and engineering projects that total nearly a half-billion dollars internationally, providing the substrate on which many companies are building significant commercial Grid products.
Documentation:
Globus Toolkit Documentation
http://www-unix.globus.org/toolkit/documentation.html

5.3  Condor-G

Source:
http://www.cs.wisc.edu/condor/
Condor-G is a computation management agent for the Grid. Condor-G is the marriage of technologies from the Condor project and the Globus project.
Condor-G provides the grid computing community with a powerful, full-featured task broker to manage jobs destined to run on resources accessible via Globus gatekeepers. Used as a front-end to a computational grid, Condor-G can manage thousands of jobs destined to run at distributed sites and provide job monitoring, logging, notification, policy enforcement, fault tolerance, credential management, and handle complex job interdependences. Condor-G's flexible and intuitive commands are appropriate for use directly by end-users, or for interfacing with higher-level task brokers and web portals.
Documentation:
Manuals
http://www.cs.wisc.edu/condor/manual/
FAQs
http://www.cs.wisc.edu/condor/manual/faq.html
Tutorials
http://www.cs.wisc.edu/condor/tutorials/

5.4  Network Weather Service

Source:
http://nws.cs.ucsb.edu/
The Network Weather Service is a distributed system that periodically monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval. The service operates a distributed set of performance sensors (network monitors, CPU monitors, etc.) from which it gathers readings of the instantaneous conditions. It then uses numerical models to generate forecasts of what the conditions will be for a given time frame. We think of this functionality as being analogous to weather forecasting, and as such, the system inherits its name.
Currently, the system includes sensors for end-to-end TCP/IP performance (bandwidth and latency), available CPU percentage, and available non-paged memory. The sensor interface, however, allows new internal sensors to be configured into the system.
Documentation:
http://nws.cs.ucsb.edu/users_guide.html
NWS consists of four main programs that start daemon processes:
There are also several utility programs including:

5.5  KX.509/KCA

http://www.citi.umich.edu/projects/kerb_pki/
KX.509 and KCA provide a bridge between a Kerberos and PKI infrastructure. This technology is included in NMI-R3 to enable the PKI-based security infrastructure of the Globus Toolkit to integrate with Kerberos-based authentication implemented at university campuses.
KCA 1.0 (Kerberized Certificate Authority) receives a Kerberos ticket and issues a short-term PKI certificate. KX.509 1.0 is the desktop client that issues a request to the KCA and manages the returned certificate.
There are five major components to K-PKI:

5.6  GSI OpenSSH

http://grid.ncsa.uiuc.edu/ssh/

GSI-OpenSSH is a modified version of OpenSSH that adds support for GSI authentication, providing a single sign-on remote login capability for the Grid. GSI-OpenSSH can be used to login to remote systems and transfer files between systems without entering a password, relying instead on a valid GSI credential for operations requiring authentication. GSI-OpenSSH provides a single sign-on capability since it can also forward GSI credentials to the remote system on login, so GSI commands (including GSI-OpenSSH commands) can be used on a remote system without the need to manually create a new GSI proxy credential on that system.

5.7  MyProxy

http://grid.ncsa.uiuc.edu/myproxy/

MyProxy is a credential repository for the Grid. Storing your Grid credentials in a MyProxy repository allows you to retrieve a proxy credential whenever and wherever you need one, without worrying about managing private key and certificate files. Using a standard web browser, you can connect to a Grid portal and allow the portal to retrieve a proxy credential for you to access Grid resources on your behalf. You can also allow trusted servers to renew your proxy credential using MyProxy, so, for example, your long-running tasks don't fail because of an expired proxy credential. A professionally managed MyProxy server can provide a more secure storage location for Grid credentials than typical end-user systems.
MyProxy provides a set of flexible authorization mechanisms for controlling access to the repository. Server-wide policies allow the MyProxy administrator to control how the repository may be used. Per-credential policies allow users to specify how each credential may be accessed. Passphrase and/or certificate-based authentication is required to retrieve credentials from MyProxy. If a credential is stored with a passphrase, the private key is encrypted with the passphrase in the MyProxy repository.
MyProxy Man Pages
http://grid.ncsa.uiuc.edu/myproxy/man/

5.8  MPICH-G2

http://www3.niu.edu/mpi/

MPICH-G2 is a grid-enabled implementation of the MPI v1.1 standard based on the popular MPICH library developed at Argonne National Laboratory. That is, using services from the Globus Toolkit(R) (e.g., job startup, security), MPICH-G2 allows you to couple multiple machines, potentially of different architectures, to run MPI applications. MPICH-G2 automatically converts data in messages sent between machines of different architectures and supports multiprotocol communication by automatically selecting TCP for intermachine messaging and (where available) vendor-supplied MPI for intramachine messaging.

5.9  Grid Packaging Tools

http://www.ncsa.uiuc.edu/Divisions/ACES/GPT/

The Grid Packaging Tools (GPT) are a collection of packaging tools built around an XML-based packaging data format. This format provides a straight forward way to define complex dependency and compatibility relationships between packages. The tools provide a means for developers to easily define the packaging data and include it as part of their source code distribution. Binary packages can be automatically generated from this data. The packages defined by GPT are compatible with other packages and can easily be converted. GPT provides tools that enable collections of packages to be built and/or installed. It also provides a package manager for those systems that do not have one.

GPT Man Pages
http://www.ncsa.uiuc.edu/Divisions/ACES/GPT/manpages/current/

5.10  Gridconfig Tools

http://rocks.npaci.edu/nmi/gridconfig/overview.html

GridConfig tools are used to configure and finetune Grid technologies. They provide an easy way to generate and regenerate configuration files in native formats, and to ensure consistency within and among applications. GridConfig lets the user manage multiple configuration files through a uniform interface that does not alter how the native components store their settings. It relies on a simple database of parameters that, when edited, can be easily regenerated to maintain a consistent configuration among the various GRIDS components.

5.11  Pubcookie

http://www.pubcookie.org/

Pubcookie is an example of a "WebISO" package, a system designed to allow users, with standard web browsers, to authenticate to web based services across many web servers, using a standard, typically username/password central authentication service.
Pubcookie consists of a standalone login server and modules for common web server platforms like Apache and Microsoft IIS. Together, these components can turn existing authentication services (like Kerberos, LDAP, or NIS) into a solution for single sign-on authentication to websites throughout an institution.
The components of Pubcookie are:

5.12  Shibboleth

http://shibboleth.internet2.edu/

Shibboleth is an open-source, standards-based tool providing mechanisms for controlling access to web based resources (even in inter-institution use), while offering options for protecting personal privacy. It consists of origin site software (Handle Server and Attribute Authority) which manages the release of attribute information, and target side software (modules for the Apache web server) which manages user sessions, obtains user attributes, and makes access control decisions. Together, these components provide an inter-institutional access control framework that allows for the preservation of personal privacy.

5.13  OpenSAML

http://www.opensaml.org/

OpenSAML is a set of open-source libraries in Java and C++ which can be used to build, transport, and parse SAML messages. OpenSAML is able to transform the individual information fields that make up a SAML message, build the correct XML representation, and unpack and process the XML before handing it off to a recipient. OpenSAML fully supports the SAML browser/POST profile for web sign-on, and supports the SOAP binding for exchange of attribute queries and attribute assertions. It does not currently support the browser/artifact profile or other SAML messages involving authorization decisions.

5.14  CPM

http://middleware.internet2.edu/hepki-tag/

PM: Certificate Profile Maker is a CGI-program package for making a certificate profile in XML format. It simultaneously produces a sample X.509 certificate in XML format according to the certificate profile.

5.15  KX.509 and KA

http://www.citi.umich.edu/projects/kerb_pki/

KX.509 and KCA provide a bridge between a Kerberos and PKI infrastructure. These tools enable the PKI-based security infrastructure of the Globus Toolkit to integrate with Kerberos-based authentication implemented at university campuses. KCA 1.0 (Kerberized Certificate Authority) receives a Kerberos ticket and issues a short-term PKI certificate. KX.509 1.0 is the desktop client that issues a request to the KCA and manages the returned certificate.

5.16  PERMIS

http://sec.isi.salford.ac.uk/permis/

PERMIS is an authorisation infrastructure that uses X.509 attribute certificates (ACs) to hold the credentials assigned to users. These ACs are stored in and retrieved from LDAP directories. PERMIS uses hierarchical Role Based Access Controls, where the X.509 ACs hold a user's roles, and superior roles inherit the privileges of subordinate roles. (However the definition of a role is very loose, and can in fact be any certified attribute of the user, such as a qualification or a membership certificate). PERMIS makes granted or denied access control decisions to a resource, based on a policy and the credentials of the user. The policy is written in XML by the administrator of the resource, and then encapsulated in an X.509 AC and stored in the LDAP entry of that administrator. PERMIS supports the distributed management of roles, as it will search in multiple LDAP directories for ACs issued by multiple Sources of Authority. PERMIS does not mandate any particular authentication mechanism, as user authentication is left entirely up to the application. All that PERMIS requires is the authenticated LDAP DN of the user. PERMIS is accessed via a simple to use Java API, making it relatively easy to incorporate into existing implementations.

5.17  Look

http://middleware.internet2.edu/dir/look/

Look is a utility written in Perl which gathers LDAP performance data at periodic intervals and generates a file of summary results in a format compatible with the open source ORCA web graphing product. Look is capable of retrieving information from the directory log (currently only iPlanet Directory Server 4.x), as well as querying the LDAP directory directly to retrieve information.

5.18  LDAP Analyzer

http://ldap.mtu.edu/internet2/analyzer/index.shtml

The LDAP Analyzer Service determines the compliance of an LDAP directory server implementation with various object class definitions such as inetOrgPerson, eduPerson, and the Grid Laboratory Universal Environment (GLUE) schema, as well as the recommendations outlined in the LDAP-recipe and other best practice documents.

5.19  Certificate Profile Registry

http://middleware.internet2.edu/certprofiles/

Consists of a profile registry, to hold profiles for standard certificate formats for the community and an institutional root certificate service, to provide a functional way for certificate path construction to be done within the community.

5.20  eduPerson

http://www.educause.edu/eduperson/

The EDUCAUSE/Internet2 eduPerson task force has the mission of defining an LDAP object class that includes widely-used person attributes in higher education. The group will draw on the work of educational standards bodies in selecting definitions of these directory attributes.

5.21  eduOrg

http://www.educause.edu/eduperson/

The eduOrg LDAP object class associates attributes to institutions, such as management and security policies, and can be used to discern the organizational structure of a college, for example.

5.22  commObject/H.350

http://middleware.internet2.edu/video/docs/H.350_/

H.350 defines a directory services architecture for multimedia conferencing for H.323, H.320, SIP and generic protocols. H.350 is based on the commObject architecture developed by VidMid-VC, the Video Middleware working group jointly sponsored by Internet2, ViDe and the ViDe.net project funded by the National Science Foundation.

5.23  Practices in Directory Groups

http://middleware.internet2.edu/dir/groups/internet2-mace-dir-groups-best-practices-200210.htm

Experiments and early experiences with facilitation of authorization in applications and facilitation of group messaging with use of directory services in institutions of higher education were surveyed. Several concepts, good practices, open issues, and a few principles extracted from this are presented.

5.24  LDAP Recipe

http://www.duke.edu/ gettes/giia/ldap-recipe/

This document is intended to be a discussion point toward the development of common directory deployments within the Higher Education community. In particular, a hope is to have institutions configure and populate their directories in similar ways to enable federated administration and distribution of directory data that allows applications, both client and server, to utilize directory infrastructures. Practical techniques are described and associated with other developments of the NMI such as metadirectories and group management.

5.25  Metadirectories Best Practices

http://middleware.internet2.edu/dir/metadirectories/internet2-mace-dir-metadirectories-practices-200210.htm

This document offers recommendations to the person or persons at institutions embarking on the implementation of groups. These recommendations are intended to be independent of the actual repository of the group information: LDAP directory, relational database, etc. Where possible, references are made to implementation-specific documentation.

5.26  Enterprise Directory Implementation Roadmap

http://www.nmi-edit.org/roadmap/internet2-mace-dir-implementation-roadmap-200312.html

The Enterprise Directory Implementation Process is a web-based structure of documentation and related resources that institutions can draw on to help deploy and use NMI-released tools and components pertaining to enterprise directories.

5.27  Shibboleth Architecture

http://shibboleth.internet2.edu/

Shibboleth, an Internet2/MACE project, is developing architectures, frameworks, and practical technologies to support inter-institutional sharing of resources that are subject to access controls. This paper presents the Shibboleth architecture for the secure exchange of interoperable authorization information that can be used in access control decision-making. The paper will present a high-level view of the interaction between sites and will provide a detailed behavioral description of model components and message exchange formats and protocols. One difference between Shibboleth and other efforts in the access control arena is Shibboleth's emphasis on user privacy and control over information release.

5.28  SAGE

http://middleware.internet2.edu/dir/docs/draft-internet2-mace-dir-sage-scenarios-00.html

Institutions contemplating projects which require numerous groups to be managed within their enterprise directory services often confront a variety of operational issues to do with the management of, representation of, and access to group information. The creation of a tool to facilitate these operational tasks has been identified as a high priority activity by the Internet2 MACE-Dir working group. Named SAGE, this document is the initial step towards specifying the functional capabilities that it should embody.

Chapter 6
Intel Roll

NPACI Info:
http://www.rocksclusters.org/roll-documentation/intel/3.1.0/
The main purpose of the Intel Roll is to install and configure the Intel C compiler (version 8.0) and the Intel Fortran compiler (version 8.0) for x86 or IA-64 machines. Additionally, the Intel Roll contains two pre-built MPICH environments built against the compilers, i.e.
Source:
http://www.intel.com/software/products/distributors/rock_cluster.htm
Documentation:
Fortran Compiler
http://www.intel.com/software/products/compilers/flin/
C/C++ Compiler
http://www.intel.com/software/products/compilers/clin/

Chapter 7
Globus Toolkit

http://www-unix.globus.org/toolkit/

The open source Globus Toolkit is a fundamental enabling technology for the "Grid," letting people share computing power, databases, and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing local autonomy. The toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management.
The toolkit includes software for security, information infrastructure, resource management, data management, communication, fault detection, and portability. It is packaged as a set of components that can be used either independently or together to develop applications. Every organization has unique modes of operation, and collaboration between multiple organizations is hindered by incompatibility of resources such as data archives, computers, and networks. The Globus Toolkit was conceived to remove obstacles that prevent seamless collaboration. Its core services, interfaces and protocols allow users to access remote resources as if they were located within their own machine room while simultaneously preserving local control over who can use resources and when.
The Globus toolkit components are:

7.1  OGSI

https://forge.gridforum.org/projects/ogsi-wg

OGSI defines mechanisms for creating, managing and exchanging information among entities called Grid services. A Grid service is a Web service that conforms to a set of conventions that define how a client interacts with a Grid service.

7.2  GSI - Security Infrastructure

http://www.globus.org/security/GSI3/index.html

The Grid Security Infrastructure (GSI) in the Globus Toolkit version 3 (GT3) represents the latest evolution of the Grid Security Infrastructure. GSI in GT3 builds off of the functionality present in early GT2 toolkit releases - X.509 certificates, TLS/SSL for authentication and message protection, X.509 Proxy Certificates for delegation and single sign-on.

7.3  System Level Services

http://www-unix.globus.org/core/

There are three basic system-level services:

7.4  GridFTP

http://www.globus.org/datagrid/gridftp.html

GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. The GridFTP protocol is based on FTP, the highly-popular Internet file transfer protocol. We have selected a set of protocol features and extensions defined already in IETF RFCs and added a few additional features to meet requirements from current data grid projects.

7.5  RFT

http://www-unix.globus.org/toolkit/reliable_transfer.html

The Reliable Transfer Service (RFT) is an OGSA based service that provides interfaces for controlling and monitoring 3rd party file transfers using GridFTP servers. The client controlling the transfer is hosted inside of a grid service so it can be managed using the soft state model and queried using the ServiceData interfaces available to all grid services. It is essentially a reliable and recoverable version of the GT2 globus-url-copy tool and more.

7.6  RLS

http://www.globus.org/rls/

The replica location service (RLS) maintains and provides access to mapping information from logical names for data items to target names. These target names may represent physical locations of data items, or an entry in the RLS may map to another level of logical naming for the data item.
The RLS is intended to be one of a set of services for providing data replication management in grids. By itself, it does not guarantee consistency among replicated data or guarantee the uniqueness of filenames registered in the directory. The RLS is intended to be used by higher-level grid services that provide these functionalities.

7.7  GRAM

http://www-unix.globus.org/developer/resource-management.html

The Globus Toolkit includes a set of service components collectively referred to as the Globus Resource Allocation Manager (GRAM). GRAM simplifies the use of remote systems by providing a single standard interface for requesting and using remote system resources for the execution of "jobs". The most common use (and the best supported use) of GRAM is remote job submission and control. This is typically used to support distributed computing applications.
GRAM is designed to provide a single common protocol and API for requesting and using remote system resources, by providing a uniform, flexible interface to, local job scheduling systems. The Grid Security Infrastructure (GSI) provides mutual authentication of both users and remote resources using GSI (Grid-wide) PKI-based identities. GRAM provides a simple authorization mechanism based on GSI identities and a mechanism to map GSI identities to local user accounts.

7.8  MDS

http://www.globus.org/mds/

MDS is designed to provide a standard mechanism for publishing and discovering resource status and configuration information. It provides a uniform, flexible interface to data collected by lower-level information providers. It has a decentralized structure that allows it to scale, and it can handle static or dynamic data.
The MDS has a hierachical structure that consists of three main components:

Chapter 8
MPI-Enabled Packages

You've successfully installed the NPACI Rocks distribution on a cluster of machines. What now? How about running some applications? In this chapter we list quite a few software packages that can be used with an MPI installation such as the one contained in NPACI Rocks. Some of the applications are helpful for turning existing applications into those that will run on a cluster with MPI (ADAPTOR, SMS). Some provide parallel versions of basic matrix manipulation routines and/or general equation solving packages (Aztec, BLAS, MUMPS, PARPACK). Some provide general problem solving environments (Cactus, Overture, PETSc). Some are for research in specific fields (APS, CTSim, FermiQCD, MADCOW, mpiBLAST, SWOM). All will allow you to interact with and learn about MPI in one way or another. Note: These packages are not included in the NPACI Rocks distribution, but must be obtained separately.

8.1  ADAPTOR

http://www.scai.fraunhofer.de/291.0.html?&L=1

ADAPTOR (Automatic DAta Parallelism TranslaTOR) is a translation system for FORTRAN codes. It supports the instrumentation of FORTRAN code with directives, as well as its transformation and run time binding. The system has been developed especially to parallelise FORTRAN applications with High Performance Fortran (HPF) or with OpenMP directives, resulting in parallel code.
In addition to an extensive run time library for parallel MPI and PThreads programs, ADAPTOR (8) incorporates a complete Fortran 90 source-to-source transformation system - modularly realized with modern compiler tools. It offers scanner and parser as well as numerous functions for dependence analysis, transformation and optimisation.

8.2  AMMPI

http://www.cs.berkeley.edu/ bonachea/ammpi/

Active Messages (AM) (8.1) is a lightweight messaging protocol used to optimize network communications with an emphasis on reducing latency by removing software overheads associated with buffering and providing applications with direct user-level access to the network hardware. AM provides low-level asymmetric network messaging primitives which have come into popular use as a low-level substrate in the implementations of higher-level parallel languages and systems.
Many implementations of AM are highly hardware-specific, and in particular they usually require the support of high-performance, non-commodity ßmart" network interfaces such as Myrinet. This unfortunately presents a problem when trying to run AM-based software on systems that have commodity network interface hardware, or network interfaces for which no AM implementation is readily available. This first part of this project attempts to bridge that gap by providing an AM-2 implementation that runs on MPI version 1.1, a standard high-performance networking layer that has been widely implemented on a number of parallel systems, and is often carefully tuned by the vendor for optimal performance. We seek to provide a compatibility layer that will allow AM-based systems to quickly get up and running on virtually any MPi-enabled platform, and turn a careful eye towards maintaining the high performance provided by the MPI layer.

8.3  APPSPACK

http://software.sandia.gov/appspack/

A package for an asynchronous parallel pattern search. APPS is an asynchronous parallel pattern search method for optimization. Pattern search uses only function values for optimization, so it can be applied to a wide variety of problems. Of particular interest to us are engineering optimization design problems characterized by a small number of variables and by expensive objective function evaluations (typically complex simulations that take minutes or hours to run). The name "pattern search" derives from the fact that a pattern of search directions is used to drive the search. Parallelism is achieved by dividing the search directions (and corresponding function evaluations) among the different processors. The äsynchronous" part comes about as a consequence of the fact that the search along each direction continues without waiting for searches along other directions to finish, in contrast to the standard parallel pattern search method.

8.4  ARPS

http://www.caps.ou.edu/ARPS/

The Advanced Regional Prediction (ARPS) is a comprehensive regional to stormscale atmospheric modeling / prediction system. It is a complete system that includes a realtime data analysis and assimilation system, the forward prediction model and a post-analysis package.
ARPS is the result of a CAPS project to develop a fully functioning stormscale NWP system. It an entirely new 3-D, nonhydrostatic model system designed for the representation of convective and cold-season storms. It includes a data ingest, quality control, and objective analysis package known as ADAS (ARPS Data Analysis System), a single-Doppler radar parameter retrieval and assimilation system known as ARPSDAS (ARPS Data Assimilation System, of which ADAS is a component), the prediction model itself, and a post-processing package known as ARPSPLT.
The numerical forecast component of the ARPS is a three-dimensional, nonhydrostatic compressible model in generalized terrain-following coordinates that has been designed to run on a variety of computing platforms ranging from single-processor scalar workstations to massively parallel scalar and scalar-vector processors. This highly modular code is extensively documented and has been written using a consistent style throughout to promote ease of learning and modifications well as maintainability. The present version contains a comprehensive physics package and has been applied successfully during the past few years to real-time operational prediction of storm-scale weather over the Southern Great Plains of the United States.

Documentation:

Version 4.0 User's Guide
http://www.caps.ou.edu/ARPS/ARPS4.guide.html
Quick Start Guide
http://www.caps.ou.edu/ARPS/arpsqg/
Using the Distributed-Memory Parallel Version of ARPS
http://www.caps.ou.edu/ARPS/ARPSmpp.html

8.5  Aztec

http://www.cs.sandia.gov/CRF/aztec1.html

Aztec is a parallel iterative library for solving linear systems, which is both easy-to-use and efficient. Simplicity is attained using the notion of a global distributed matrix. The global distributed matrix allows a user to specify pieces (different rows for different processors) of his application matrix exactly as he would in the serial setting (i.e. using a global numbering scheme). Issues such as local numbering, ghost variables, and messages are ignored by the user and are instead computed by an automated transformation function. Efficiency is achieved using standard distributed memory techniques; locally numbered submatrices, ghost variables, and message information computed by the transformation function are maintained by each processor so that local calculations and communication of data dependencies is fast. Additionally, Aztec takes advantage of advanced partitioning techniques (Chaco) and utilizes efficient dense matrix algorithms when solving block sparse matrices.

8.6  BEARCLAW

http://www.amath.unc.edu/Faculty/mitran/bearclaw.html

A general purpose package for solving time dependent PDEs featuring:

8.7  BLACS

http://www.netlib.org/blacs/

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms.
The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer of ScaLAPACK.

8.8  BLZPACK

http://crd.lbl.gov/ osni/

An implementation of the block Lanczos algorithm for the solution of the standard and generalized eigenvalue problems. The development of this eigensolver was motivated by the need to solve large, sparse, generalized problems from free vibration analyses in structural engineering. Several upgrades were performed afterwards aiming at the solution of eigenvalues problems from a wider range of applications.

8.9  Cactus

http://www.cactuscode.org/

An open source problem solving environment designed for scientists and engineers. Its modular structure easily enables parallel computation across different architectures and collaborative code development between different groups.
The name Cactus comes from the design of a central core (or "flesh") which connects to application modules (or "thorns") through an extensible interface. Thorns can implement custom developed scientific or engineering applications, such as computational fluid dynamics. Other thorns from a standard computational toolkit provide a range of computational capabilities, such as parallel I/O, data distribution, or checkpointing.
Cactus runs on many architectures. Applications, developed on standard workstations or laptops, can be seamlessly run on clusters or supercomputers. Cactus provides easy access to many cutting edge software technologies being developed in the academic research community, including the Globus Metacomputing Toolkit, HDF5 parallel file I/O, the PETSc scientific library, adaptive mesh refinement, web interfaces, and advanced visualization tools.

8.10  CAM

http://www.ccsm.ucar.edu/models/atm-cam/

The Community Atmosphere Model (CAM) serves as the atmospheric component of the Community Climate System Model (CCSM). CAM-2.0.1 is the latest in a series of global atmosphere models, previously known as the Community Climate Model (CCM), developed for the weather and climate research communities. CAM2 includes the new CCSM land surface model, the Community Land Model (CLM2.0). CLM2.0 replaces the previous land model, LSM1.

Documentation:

User's Guide to NCAR CAM2.0
http://www.ccsm.ucar.edu/models/atm-cam/UsersGuide/
Scientific Description of CAM 2.0
http://www.ccsm.ucar.edu/models/atm-cam/docs/description/index.html

8.11  Chombo

http://seesar.lbl.gov/ANAG/chombo/index.html

The Chombo package provides a set of tools for implementing finite difference methods for the solution of partial differential equations on block-structured adaptively refined rectangular grids. Both elliptic and time-dependent modules are included. Support for parallel platforms and standardized self-describing file formats are included.
Chombo provides a distributed infrastructure for parallel calculations over block-structured, adaptively refined grids. Chombo's design is uniquely flexible and accessible. Any collaborator will be able to develop parallel applications to solve the partial differential equations in which she is interested with far shorter development times than would be possible without the infrastructure. Very careful design and documentation allows said collaborator to enter the software at many levels. She will be able to use Chombo to investigate deep technical issues of adaptive mesh refinement algorithms or to simply adapt the example applications to solve different scientific problems.

8.12  CLAWPACK

http://www.amath.washington.edu/ claw/

CLAWPACK (Conservation LAWs PACKage) is a package of Fortran routines for solving time-dependent hyperbolic systems of PDEs in 1-, 2- and 3-D, including nonlinear systems of conservation laws. The software can also be used to solve nonconservative hyperbolic systems and systems with variable coefficients, as well as systems including source terms. It includes an MPI version in the which domain can be distributed among multiple processors, and adaptive mesh refinement versions (AMRCLAW) in 2- and 3-D.

8.13  CTSim

http://www.ctsim.org/

CTSim simulates the process of transmitting X-rays through phantom objects. These X-ray data are called projections. CTSim reconstructs the original phantom image from the projections using a variety of algorithms. Additionally, CTSim has a wide array of image analysis and image processing functions.

8.14  DAGH

http://www.cs.utexas.edu/users/dagh/

DAGH (which stands for Distributed Adaptive Grid Hierarchy) was developed as a computational toolkit for the Binary Black Hole NSF Grand Challenge Project. It provides the framework to solve systems of partial differential equations using adaptive finite difference methods. The computations can be executed sequentially or in parallel according to the specification of the user. DAGH also provides a programming interface so that these computations can be performed by traditional Fortran 77 and Fortran 90 or C and C++ kernels.

8.15  Dakota

http://endo.sandia.gov/DAKOTA/software.html

The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a flexible, extensible interface between analysis codes and iterative systems analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, analytic reliability, and stochastic finite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/main effects analysis with design of experiments and parameter study capabilities. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a flexible and extensible problem-solving environment as well as a platform for research and rapid prototyping of advanced solution methodologies.

8.16  DPMTA

http://www.ee.duke.edu/ wrankin/Dpmta/

DPMTA is a portable implementation of the Paralel Multipole Tree Algorithm (PMTA) which runs in a distributed environment under the PVM and MPI toolsets. DPMTA provides application programmers with a easy to use interface to compute full N-body interaction solutions.

8.17  EGO

http://www.lrz-muenchen.de/ heller/ego/index.html

EGO is a program to perform molecular dynamics simulations on parallel as well as on sequential computers. It was developed for the simulation of large molecular systems on parallel computers (under PVM, MPI or PARIX). EGO uses a multiple time step algorithm combined with a structure adapted fast multipole method for the description of long range electrostatics. The method has been demonstrated to scale linearly with the number of atoms in the range of about 40,000 atoms.

8.18  EPIC

http://atmos.nmsu.edu/data_and_services/software/epic/epic.htm

The Explicit Planetary Isentropic Coordinate atmospheric model. A general circulation model designed for planetary atmospheric studies.

8.19  FermiQCD

http://www.phoenixcollective.org/mdp/index_fermiqcd.html

FermiQCD is a collection of classes, functions and parallel algorithms for lattice QCD written in C++. It is based on Matrix Distributed Processing (MDP). The latter is a library that includes C++ methods for matrix manipulation, advanced statistical analysis (such as Jackknife and Boostrap) and optimized algorithms for inter-process communications of distributed lattices and fields. These communications are implemented using Message Passing Interface (MPI) but MPI calls are hidden to the high level algorithms that constitute FermiQCD.

8.20  GADGET

http://www.mpa-garching.mpg.de/gadget/

GADGET is a freely available code for cosmological N-body/SPH simulations on serial workstations, or on massively parallel computers with distributed memory. The parallel version of GADGET uses an explicit communication model that is implemented with the standardized MPI communication interface.

8.21  GASNet

http://www.cs.berkeley.edu/ bonachea/gasnet/index.html

GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages such as UPC and Titanium. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking". MPI is one of several networking conduits over which GASnet can be used.

Documentation:

GASNet Specification:
http://www.cs.berkeley.edu/ bonachea/gasnet/dist/docs/gasnet.html

8.22  GFS

http://gfs.sourceforge.net/

Gerris is an Open Source Free Software library for the solution of the partial differential equations describing fluid flow. The features include:

8.23  GS2

http://gs2.sourceforge.net/

GS2 is a physics application, developed to study low-frequency turbulence in magnetized plasma. It is typically used to assess the microstability of plasmas produced in the laboratory and to calculate key properties of the turbulence which results from instabilities. It is also used to simulate turbulence in plasmas which occur in nature, such as in astrophysical and magnetospheric systems.

8.24  HDF

http://hdf.ncsa.uiuc.edu/HDF5/

A general purpose library and file format for storing scientific data. HDF5 was created to address the data management needs of scientists and engineers working in high performance, data intensive computing environments. As a result, the HDF5 library and format emphasize storage and I/O efficiency. For instance, the HDF5 format can accommodate data in a variety of ways, such as compressed or chunked. And the library is tuned and adapted to read and write data efficiently on parallel computing systems.

8.25  HYCOM

http://hycom.rsmas.miami.edu/

The HYbrid Coordinate Ocean Model.

8.26  HYPRE

http://www.llnl.gov/CASC/hypre/

Hypre is a library for solving large, sparse linear systems of equations on massively parallel computers. It provides various conceptual interfaces to enable application users to access the library in the way they naturally think about their problems. It can be used both as a solver package and as a framework for algorithm development. Its object model is more general and flexible than the current generation of solver libraries.
The conceptual interfaces provided by Hypre are:

8.27  iMOOSE

http://imoose.sourceforge.net/

A general purpose application framework for the development of Finite Element solvers and related tools. It mainly focuses on electromagnetic problems arising from the design of electrical machines.

8.28  ISIS++

http://z.ca.sandia.gov/isis/

ISIS++ is a portable, object-oriented framework for solving sparse systems of linear equations. The framework includes implementations of a number of Krylov subspace iterative solution methods and preconditioners, as well as both uni-processor and multi-processor matrix and vector classes. Though ISIS++ was developed to solve systems of equations originating from large-scale, 3-D, finite element analysis models, it has applications in many other fields.

8.29  ISTV

http://www.erc.msstate.edu/vail/projects/ISTV/

ISTV is a visualization system for time-varying, three-dimensional, structured data sets. It was developed for the large data sets generated by ocean circulation models, but the software was designed as a general package which could be used on data from other domains. For example, most of the ocean model data are on grids which are rectilinear in latitude and longitude, but this is not a requirement of the system. In fact, ISTV can handle data on curvilinear grids. ISTV has been used with data from a wide variety of sources other than ocean model data, such as aerospace simulations, electromagnetic simulations, and medical data.

8.30  LAMMPS

http://www.cs.sandia.gov/ sjplimp/lammps.html

LAMMPS is a classical molecular dynamics (MD) code created for simulating molecular and atomic systems such as proteins in solution, liquid-crystals, polymers, zeolites, or simple Lenard-Jonesium. It was designed for distributed-memory parallel computers and runs on any parallel platform that supports the MPI message-passing library or on single-processor desktops or laptops.

8.31  LFC

http://icl.cs.utk.edu/lfc/

LFC is a software project that merges the ease of use of LAPACK with parallel processing capabilities of ScaLAPACK, without the latter one's software dependences other than BLAS and MPI implementation. It is a self-contained package with built-in knowledge of how to run linear algebra software on a cluster.

8.32  libMesh

http://libmesh.sourceforge.net/

A C++ framework for the numerical solution of PDEs on serial and parallel platforms. This requires PETSc.

8.33  LMPI

http://www.lrz-muenchen.de/services/software/parallel/lmpi/

A wrapper library for the standard MPI library for post-mortem analysis of parallel programs. It is based on the MPICH sources and supports profiling of C as well as FORTRAN MPI programs. A logfile is produced during the run of a profiled MPI program. The logfile can be used for visualization and optimization of the communication behavior of MPI applications.

8.34  LOCA

http://www.cs.sandia.gov/projects/loca/main.html

LOCA is a new software library for performing bifurcation analysis of large-scale applications. LOCA (which is written in "C") is designed to drive application codes that use Newton's method to locate steady-state solutions to nonlinear problems. The algorithms are chosen to work for large problems, such as those that arise from discretizations of partial differential equations, and to run on distributed memory parallel machines.
The approach in LOCA for locating and tracking bifurcations begins with augmenting the residual equations defining a steady state with additional equations that describe the bifurcation. A Newton method is then formulated for this augmented system; however, instead of loading up the Jacobian matrix for the entire augmented system (a task that involved second derivatives and dense matrix rows), bordering algorithms are used to decompose the linear solve into several solves with smaller matrices. Almost all of the algorithms just require multiple solves of the Jacobian matrix for the steady state problem to calculate the Newton updates for the augmented system. This greatly simplifies the implementation, since this is the same linear system that an application code using Newton's method will already have invested in.
The algorithms available in LOCA include zero-order, first-order, arc length, multi-parameter, turning point, pitchfork bifurcation, Hopf bifurcation and phase transition continuation, as well as eigenvalue approximation (via ARPACK).

8.35  MADCOW

http://www.mrao.cam.ac.uk/software/madcow/

A set of parallelised programs written in ANSI C and Fortran 77 that perform a maximum likelihood analysis of visibility data from interferometers observing the cosmic microwave background (CMB) radiation. This software is being used to produce power spectra of the CMB with the Very Small Array (VSA) telescope.

8.36  magpar

http://magnet.atp.tuwien.ac.at/scholz/magpar/download/

A finite element micromagnetics package. This requires PETSc.

8.37  MARMOT

http://www.hlrs.de/people/mueller/projects/marmot/

Tools for analyzing and checking MPI programs.

8.38  MDP

http://www.phoenixcollective.org/mdp/index_mdp.html

Matrix Distribute Processing (MDP) is a toolkit for fast development parallel applications. The programming tools include classes and algorithms for matrices, random number generators, distributed lattices (with arbitrary topology), fields and parallel iterations. MDP is based on MPI but no knowledge of MPI or other message passing protocol is required in order to use them.

8.39  MGRIDGEN

http://www-users.cs.umn.edu/ moulitsa/software.html

MGRIDGEN is a serial library written entirely in ANSI C that implements (serial) algorithms for obtaining a sequence of successive coarse grids that are well-suited for geometric multigrid methods. The quality of the elements of the coarse grids is optimized using a multilevel framework. It is portable on most Unix systems that have an ANSI C compiler.
PARMGRIDGEN is is an MPI-based parallel library that is based on the serial package MGRIDGEN. PARMGRIDGEN extends the functionality provided by MGRIDGEN and it is especially suited for large scale numerical simulations. It is written entirely in ANSI C and MPI and is portable on most parallel computers that support MPI.

8.40  MITgcm

http://mitgcm.org/

The MITgcm (MIT General Circulation Model) is a numerical model designed for study of the atmosphere, ocean, and climate. Its non-hydrostatic formulation enables it to simulate fluid phenomena over a wide range of scales; its adjoint capability enables it to be applied to parameter and state estimation problems. By employing fluid isomorphisms, one hydrodynamical kernel can be used to simulate flow in both the atmosphere and ocean.

Documentation:

User's Manual
http://mitgcm.org/sealion/
Development HOWTO
http://mitgcm.org/devel_HOWTO/devel_HOWTO_onepage/

8.41  MM5

http://box.mmm.ucar.edu/mm5/

A limited-area, nonhydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation.

8.42  MOUSE

http://www.vug.uni-duisburg.de/MOUSE/

MOUSE is an object oriented framework for finite volume computations on unstructured grids. Right now it is mainly targeted at people who want to develop specialized numerical programs. One of the main objectives has been to ease the use of unstructured grids for finite volume codes.

8.43  mpiBLAST

http://mpiblast.lanl.gov/index.html

A freely available open source parallelization of NCBI BLAST. mpiBLAST segments the BLAST database and distributes it across cluster nodes, permitting BLAST queries to be processed on many nodes simultaneously. mpiBLAST is based on MPI.

8.44  MPB

http://ab-initio.mit.edu/mpb/

The MIT Photonic-Bands (MPB) package is a free program for computing the band structures (dispersion relations) and electromagnetic modes of periodic dielectric structures, on both serial and parallel computers. This program computes definite-frequency eigenstates of Maxwell's equations in periodic dielectric structures for arbitrary wavevectors, using fully-vectorial and three-dimensional methods. It is especially designed for the study of photonic crystals (a.k.a. photonic band-gap materials), but is also applicable to many other problems in optics, such as waveguides and resonator systems. (For example, it can solve for the modes of waveguides with arbitrary cross-sections.
The features of MPB include:

8.45  mpiP

http://www.llnl.gov/CASC/mpip/

A lightweight profiling library for MPI applications. Because it only collects statistical information about MPI functions, mpiP generates considerably less overhead and much less data than tracing tools. All the information captured by mpiP is task-local. It only uses communication at the end of the application experiment to merge results from all of the tasks into one output file.

8.46  MPP

http://www.gfdl.gov/ vb/

A modular parallel computing infrastructure.

8.47  MUMPS

http://www.enseeiht.fr/lima/apo/MUMPS/

A multifrontal massively parallel sparse direct solver whose features include:

8.48  NaSt3DGP

http://wissrech.iam.uni-bonn.de/research/projects/NaSt3DGP/

An implementation of a Chorin-type project method for the solution of the 3-D Navier-Stokes equations.

8.49  NetPIPE

http://www.scl.ameslab.gov/netpipe/

NetPIPE is a protocol independent performance tool that encapsulates the best of ttcp and netperf and visually represents the network performance under a variety of conditions. It performs simple ping-pong tests, bouncing messages of increasing size between two processes, whether across a network or within an SMP system. Message sizes are chosen at regular intervals, and with slight perturbations, to provide a complete test of the communication system. Each data point involves many ping-pong tests to provide an accurate timing.

8.50  OPT++

http://csmr.ca.sandia.gov/projects/opt++/opt++.html

OPT++ is a library of nonlinear optimization algorithms written in C++. The motivation for this package is to build an environment for the rapid prototyping and development of new optimization algorithms. In particular, the focus is on robust and efficient algorithms for problems in which the function and constraint evaluations require the execution of an expensive computer simulation. Currently, OPT++ includes the classic Newton methods, a nonlinear interior-point method, parallel direct search, a trust region - parallel direct search hybrid, and a wrapper to NPSOL. Between these methods, a wide range of problems can be solved, e.g. with or without constraints, with or without analytic gradients, simulation based, etc.

8.51  Overture

http://acts.nersc.gov/overture/main.html

Overture is a set of object-oriented tools for solving computational fluid dynamics and combustion problems in complex moving geometries. It has been designed for solving problems on a structured grid or a collection of structured grids. It can use curvilinear grids, adaptive mesh refinement, and the composite overlapping grid method to represent problems involving complex domains with moving components.
Overture programs are written at a very high-level, using data-parallel array expressions in the style of HPF. They can achieve high performance (comparable to FORTRAN) thanks to a preprocessor (C++ to C++) called ROSE. Effectively, ROSE is a replacement for the expression template technique of POOMA. Overture has aggregate array operations and tightly integrated graphical features based on OpenGL. AMR++, a package that directly support adaptive mesh refinement methods, is built on top of Overture.

Documentation:

A++/P++ Manual
http://www.llnl.gov/casc/Overture/henshaw/documentation/App/manual/manual.html

8.52  PALM

http://www.cerfacs.fr/ palm/

A system for creating complex modular and parallel applications. It was originally designed to handle operation data assimilation applications, but has much more general application.

8.53  PARAMESH

http://ct.gsfc.nasa.gov/paramesh/Users_manual/amr.html

PARAMESH is a package of Fortran 90 subroutines designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement(AMR). Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity.
PARAMESH builds a hierarchy of sub-grids to cover the computational domain, with spatial resolution varying to satisfy the demands of the application. These sub-grid blocks form the nodes of a tree data-structure (quad-tree in 2D or oct-tree in 3D). Each grid block has a logically cartesian mesh.

8.54  ParaSol

http://www.cs.purdue.edu/research/PaCS/parasol.html

A parallel discrete event simulation system that supports optimistic and adaptive synchronization methods.

8.55  ParMETIS

http://www-users.cs.umn.edu/ karypis/metis/parmetis/

An MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on parallel multilevel k-way graph-partitioning algorithms, adaptive repartitioning algorithms, and parallel multi-constrained algorithms.
ParMETIS provides five major functions:

8.56  pARMS

http://www-users.cs.umn.edu/ saad/software/pARMS/

A library of parallel solvers for distributed sparse linear systems of equations. It is based on a preconditioned Krylov subspace approach, using a domain decomposition viewpoint. It offers a large selection of preconditioners for distributed sparse linear systems and a few of the best known accelerators.

8.57  PARPACK

http://www.caam.rice.edu/ kristyn/parpack_home.html

ARPACK is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems. PARPACK is a parallel version of ARPACK that uses BLACS and MPI for parallelization.
ARPACK software is capable of solving large scale symmetric, nonsymmetric, and generalized eigenproblems from significant application areas. The software is designed to compute a few (k) eigenvalues with user specified features such as those of largest real part or largest magnitude. Storage requirements are on the order of n*k locations. No auxiliary storage is required. A set of Schur basis vectors for the desired k-dimensional eigen-space is computed which is numerically orthogonal to working precision. Numerically accurate eigenvectors are available on request.

8.58  ParVox

http://pat.jpl.nasa.gov/public/ParVox/

ParVox is a parallel volume rendering system using the splatting algorithm.

8.59  PaStiX

http://dept-info.labri.u-bordeaux.fr/ ramet/pastix/

A parallel direct solver for very large sparse symmetric positive definite systems of linear equations.

8.60  PETSc

http://www.mcs.anl.gov/petsc/

PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It employs the MPI standard for all message-passing communication. PETSc is intended for use in large-scale application projects, and several ongoing computational science projects are built around the PETSc libraries. With strict attention to component interoperability, PETSc facilitates the integration of independently developed application modules, which often most naturally employ different coding styles and data structures.
PETSc is easy to use for beginners. Moreover, its careful design allows advanced users to have detailed control over the solution process. PETSc includes an expanding suite of parallel linear and nonlinear equation solvers that are easily used in application codes written in C, C++, and Fortran. PETSc provides many of the mechanisms needed within parallel application codes, such as simple parallel matrix and vector assembly routines that allow the overlap of communication and computation. In addition, PETSc includes growing support for distributed arrays.
The main PETSc components are:

8.60.1  PETSc Applications

http://www-unix.mcs.anl.gov/petsc/petsc-2/publications/petscapps.html

8.61  PHAML

http://math.nist.gov/phaml/

The primary goal of the PHAML project is to produce a parallel version of MGGHAT. MGGHAT is a sequential program for the solution of 2D elliptic partial differential equations using low or high order finite elements, adaptive mesh refinement based on newest node bisection of triangles, and multigrid. All aspects of the method are based on the hierarchical basis functions.
The PHAML code is being developed as a prototype implementation. It is written in Fortran 90, which provides modules for modularity and data abstraction, optional arguments for a flexible user interface, and many other useful language features. Visualization is obtained through the OpenGL(R) graphics library. Many viewing options are available for the refinement trees, grids and solutions, along with zooming, panning and rotating capabilities. PHAML uses either a master/slave or SPMD model of parallel computation. Message passing is performed with either PVM or MPI.

8.62  PIKAIA

http://whitedwarf.org/index.html?parallel/&0

PIKAIA is a genetic algorithm based optimization program. It incorporates only the two basic genetic operators: uniform one-point crossover, and uniform one-point mutation. The encoding within PIKAIA is based on a decimal alphabet made of the 10 simple integers (0 through 9); this is because binary operations are usually carried out through platform-dependent functions in FORTRAN. Three reproduction plans are available: Full generational replacement, Steady-State-Delete-Random, and Steady-State-Delete-Worst. Elitism is available and is a default option. The mutation rate can be dynamically controlled by monitoring the difference in fitness between the current best and median in the population (also a default option). Selection is rank-based and stochastic, making use of the Roulette Wheel Algorithm. PIKAIA is supplied with a ranking subroutine based on the Quicksort algorithm, and a random number generator based on the minimal standard Lehmer multiplicative linear congruential generator.

8.63  PLANSO

http://www.nersc.gov/research/SIMON/planso.html

Implements a Lanczos iteration for symmetric generalized eigenvalue problems.

8.64  PLASIM

http://puma.dkrz.de/planet/

A coupled system of climate components for simulating the climates of Earth, Mars and Titan.

8.65  PMESA

http://www.cs.sandia.gov/VIS/pmesa.html

A parallel version of Mesa, a free implementation of OpenGL. TNT_PMESA is a library that can be called by parallel (or non-parallel) surface-creation programs to render the surfaces they create.

8.66  POP

http://climate.lanl.gov/Models/POP/index.htm

POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner and Chervin in which depth is used as the vertical coordinate. The model solves the three-dimensional primitive equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations. Spatial derivatives are computed using finite-difference discretizations which are formulated to handle any generalized orthogonal grid on a sphere, including dipole and tripole grids which shift the North Pole singularity into land masses to avoid time step constraints due to grid convergence.
Although POP was originally developed for the Connection Machine, it was designed from the start for portability by isolating all routines involving communication into a small set (5) of modules which can be modified for specific architectures. Currently, versions of these routines exist for MPI and SHMEM communication libraries and also for serial execution. The appropriate directory is chosen at compile time and no pre-processor directives are used to support different machines. Support for hybrid programming using threads and message passing has recently been added and is described in the User's Guide.

8.67  PPAT

http://pauillac.inria.fr/cdrom/www/ppat/description.htm

PPAT (Parallel Path following Algorithm using Triangles) is a new parallel tool for the computation of pseudospectra. The underlying technique uses a reliable level curve tracing algorithm to compute the boundary of the pseudospectrum. The path following algorithm offers total reliability and can handle singularities along the level curve without difficulty. It offers a guarantee of termination even in the presence of round-off errors and makes use of a large granularity for parallelism to achieve large speed-ups and high efficiency. The software is able to trace multiple level curves independently; furthermore, the ability to compute multiple slices of the same level curve simultaneously enhances its speed-ups and the efficiencies. The user drives the parallel application through a graphical user interface; the interface includes the graphical and operational features crucial for the appreciation of the information provided be the pseudospectra.

8.68  Prometheus

http://www.cs.berkeley.edu/ madams/prom_intro.html

A highly parallel multigrid solver for the set of linear algebraic equations that arise from three dimensional finite element discretizations of PDEs. Prometheus is implemented in C++ and callable from C, C++ and FORTRAN, and is built on PETSc and ParMetis.
Prometheus uses multigrid to condition Krylov subspace methods to solve nonsingular matrix equations. Two fully parallelized unstructured multigrid methods are implemented. The first automatically constructs the coarse grids using an algorithm based on maximal independent sets and Delaunay tesselation, and constructs the restriction operators using linear finite elmeent shape functions on the coarse grid. The second method is an algebraic or smoothed aggregation method that provides similar or better performance, and is a simpler and more robust algorithm.
The Prometheus library is designed to fit into existing finite element packages as easily as possible and still provide an effective solver for challenging large scale applications. Prometheus requires that the user provide the finite element mesh - for the fine grid only - in parallel (as well as the matrix and right hand side). Prometheus also provides parallel algebraic multigrid support (the matrix triple product and simplified interface to PETSc). Prometheus constructs the coarse grid operators and PETSc solver objects and provides a simple user interface to solve linear equations.

8.69  PSPACES

http://www-users.cs.umn.edu/ mjoshi/pspases/index.html

PSPASES (Parallel SPArse Symmetric dirEct Solver) is a high performance, scalable, parallel, MPI-based library, intended for solving linear systems of equations involving sparse symmetric positive definite matrices. The library provides various interfaces to solve the system using four phases of direct method of solution: compute fill-reducing ordering, perform symbolic factorization, compute numerical factorization, and solve triangular systems of equations.

8.70  PUMA

http://puma.dkrz.de/puma/

The Portable University Model of the Atmosphere is a circulation model in FORTRAN-90 developed at the Meteorological Institute of the University of Hamburg. PUMA originated in a numerical prediction model that was altered to include only the most relevant processes in the atmosphere. It serves as a training tool for junior scientists, allowing them to work with a program that is easier to understand and modify than ECHAM.

8.71  QCDimMPI

http://insam.sci.hiroshima-u.ac.jp/QCDMPI/QCDimMPI.html

Pure QCD Monte Carlo simulation code with MPI.

8.72  RAMS

http://bridge.atmet.org/users/software.php

The Regional Atmospheric Modeling System is for numerical simulations of atmospheric meteorology and other environmental phenomena on scales from meters to 100s of kilometers.

8.73  RSL

http://www-unix.mcs.anl.gov/ michalak/rsl/

RSL is a parallel runtime system library developed at Argonne National Laboratory that is tailored to regular-grid atmospheric models with mesh refinement in the form of two-way interacting nested grids. RSL provides high-level stencil and interdomain communication, irregular domain decomposition, automatic local/global index translation, distributed I/O, and dynamic load balancing.
A unique feature of RSL is that processor subdomains need not be rectangular patches; rather, grid points are independently allocated to processors, allowing more precisely balanced allocation of work to processors. Communication mechanisms are tailored to the application: RSL provides an efficient high-level stencil exchange operation for updating subdomain ghost areas and interdomain communication to support two-way interaction between nest levels. RSL also provides run-time support for local iteration over subdomains, global-local index translation, and distributed I/O from ordinary Fortran record-blocked data sets. The interface to RSL supports Fortran77 and Fortran90.

8.74  SAMRAI

http://www.llnl.gov/CASC/SAMRAI/

SAMR is a particular approach to adaptive mesh refinement in which the computational grid is implemented as a collection of structured mesh components. The computational mesh consists of a hierarchy of levels of spatial and temporal mesh resolution. Typically, each level in the hierarchy corresponds to a single uniform degree of mesh spacing for a numerical method. However, each level may also employ a computational model different than other levels in the hierarchy. Within a SAMR hierarchy, levels are nested; that is, the coarsest level covers the entire computational domain and each successively finer level covers a portion of the interior of the next coarser level. Computational cells on each level are clustered to form a set of logically-rectangular patch regions. Simulation data is stored on these patches in contiguous arrays that map directly to the mesh cells without excessive indirection.
SAMR solution methods share characteristics with uniform, non-adaptive structured grid methods. In particular, the computation may be organized as a collection of numerical routines that operate on data defined over logically-rectangular regions and communication operations that pass information between those regions, for example, to fill "ghost cells". However, since an SAMR solution is constructed on a composite mesh, the numerical algorithm must treat internal mesh boundaries between coarse and fine levels properly to maintain a consistent solution state.

8.75  ScaLAPACK

http://www.netlib.org/scalapack/scalapack_home.html

The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is currently written in a Single-Program-Multiple-Data style using explicit message passing for interprocessor communication. ScaLAPACK is designed for heterogeneous computing and is portable on any computer that supports MPI or PVM.
Like LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. (For such machines, the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor.) The fundamental building blocks of the ScaLAPACK library are distributed memory versions (PBLAS) of the Level 1, 2 and 3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the PBLAS and the BLACS. One of the design goals of ScaLAPACK was to have the ScaLAPACK routines resemble their LAPACK equivalents as much as possible.

8.76  SDPARA

http://sdpa.is.titech.ac.jp/sdpara.index.html

The SDPARA (SemiDefinite Programming Algorithm PARAllel version) is a parallel version of the SDPA. C++ source codes of the SDPARA are available. They form a stand-alone software package for solving SDPs in parallel with the help of MPI (Message Passing Interface) and ScaLAPACK (Scalable LAPACK).
The SDPA (SemiDefinite Programming Algorithm) is a software package for solving semidefinite program (SDP). It is based on a Mehrotra-type predictor-corrector infeasible primal-dual interior-point method. The SDPA handles the standard form SDP and its dual. It is implemented in C++ language utilizing the LAPACK for matrix computation. The SDPA incorporates dynamic memory allocation and deallocation. So, the maximum size of an SDP to be solved depends on the size of memory which users' computers install.

8.77  SGOPT

http://www.cs.sandia.gov/SGOPT/

The SGOPT optimization library provides an object-oriented interface to a variety of optimization algorithms, especially stochastic optimization methods used for global optimization. This includes a generic class hierarchy for optimization and optimization problems. This class hierarchy includes a generic notion of asynchronous parallel execution for optimization problems, which is used by many SGOPT optimizers.
SGOPT includes the following global and local optimization methods:
SGOPT stands for Stochastic Global OPTimization and for expensive optimization problems its global optimizers are best suited for identifying promising regions in the global design space. In multimodal design spaces, the combination of global identification (from SGOPT) with efficient local convergence (from a gradient-based algorithm) can be highly effective. The SGOPT methods are not gradient-based, which makes them appropriate for discrete problems as well as problems for which gradient information is unavailable or is of questionable accuracy due to numerical noise, etc.

8.78  SLEPc

http://www.grycap.upv.es/slepc/

The Scalable Library for Eigenvalue Problem Computations is a library for the solution of large scale sparse eigenvalue problems on parallel computers. It i sbuilt on top of PETSc, and can be considered an extension of PETSc providing all the functionality needed to solve eigenvalue problems. It can be used for either standard or generalized eigenproblems, with real or complex arithmetic. It can be used for the solution of problems formulated in either standard or generalized form, as well as other related problems such as the singular value decomposition.
The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are sparse, e.g. those arising after the discretization of PDEs. Therefore, most of the methods offered by the library are projection methods or other methods with similar properties, e.g. Arnoldi, Lanczos and Subspace iteration methods. SLEPC implements these as well as more sophisticated algorithms, and also provides built-in support for spectral transformations such as shift-and-invert.

8.79  SMS

http://www-ad.fsl.noaa.gov/ac/sms.html

The SMS is a directive-based parallelization tool that translates Fortran code into a parallel version that runs efficiently on both shared and distributed memory systems. This software has been used sucessfully since 1993 to parallelize and run many oceanic and atmospheric models, some of which produce weather forecasts for the National Weather Service.
These models contain structured regular grids that are resolved using either finite difference approximation or Gauss-Legendre spectral methods. SMS also provides support for mesh refinement, and can transform data between grids that have been decomposed differently (eg. grid and spectral space). While the tool has been tailored toward finite difference approximation and spectral weather and climate models, the approach is sufficiently general to be applied to other structured grid codes. As the development of SMS has matured, the time and effort required to parallelize codes for MPPs has been reduced significantly. Code parallelization has become simpler because SMS provides support for advanced operations including incremental parallelization and parallel debugging.
SMS provide a number of performance optimizations. The SMS run-time libraries have been optimized to speed inter-processor communications using techniques such as aggregation. Array aggregation permits multiple model variables to be combined into a single communications call to reduce message-passing latency. SMS also allows the user to perform computations in the halo region to reduce communications. High performance I/O is also provided by SMS. Since atmospheric models typically output forecasts several times during a model run, SMS can output these data asynchronous to model execution. These optimization can lead to significantly faster execution times.

8.80  Snark

http://www.vpac.org/VDT/Geoscience/about/about_snark.php

An extensible framework for building finite element/particle-in-cell applications, customizations and extensions. It began as a redesign of Louis Moresi's Ellipsis FEM code, so one of its aims is to be a freely available, 3D, extensible, scalable and parallel version of Ellipsis. The key components of the framework are offering various methods of solving finite element method and particle in cell equations, modelling complex rheologies and allowing users to substitute their own, and controlling the precise set-up and application of initial and boundary conditions.
Snark was designed to run in parallel from its inception. Another of its advantage over many other physical solvers is its hybrid Finite-Element/Particle-In-Cell method, which can use fast implicit solving techniques, yet can also track the movement of material properties - such as the type of rock or mineral. Snark also comes with in-build modelling of several viscosity models, stress-dependence, and yielding, and allows the precise set-up of initial boundary and temperature conditions.

8.81  SPAI

http://www.inf.ethz.ch/personal/broeker/spai/

A sparse, iterative solver package. The SPAI algorithm explicitly computes a sparse approximate inverse which can then be applied as a preconditioner to an iterative method. The sparsity pattern of the approximate inverse is not imposed a priori but captured automatically.

8.82  Sphinx

http://www.llnl.gov/CASC/sphinx/

Sphinx, an integrated parallel microbenchmark suite, consists of a harness for running performance tests and extensive tests of MPI, Pthreads and OpenMP.

8.83  S+

http://www.cs.ucsb.edu/projects/s+/

Sparse LU factorization with partial pivoting.

8.84  SPOOLES

http://www.netlib.org/linalg/spooles/spooles.2.2.html

A library for solving sparse real and complex linear systems of equations, written in the C language using object oriented design. The functionality includes:

8.85  SUNDIALS

http://acts.nersc.gov/sundials/main.html

SUNDIALS (SUite of Nonlinear and DIfferential/ALgebraic equation Solvers) refers to a family of closely related equation solvers. These solvers have some code modules in common, primarily a module of vector kernels and generic linear system solvers, including one based on a Scaled Preconditioned GMRES method. All of the solvers are suitable for either serial or parallel environments. Parallelization was accomplished by rewriting the module of vector kernels, whereby the parallel version of each kernel operates on vectors that have been distributed across processors. All message passing calls are made through MPI.

8.86  SuperLU

http://crd.lbl.gov/ xiaoye/SuperLU/

SuperLU is a general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations on high performance machines. The library is written in C and is callable from either C or Fortran. The library routines will perform an LU decomposition with partial pivoting and triangular system solves through forward and back substitution. The LU factorization routines can handle non-square matrices but the triangular solves are performed only for square matrices. The matrix columns may be preordered (before factorization) either through library or user supplied routines. This preordering for sparsity is completely separate from the factorization. Working precision iterative refinement subroutines are provided for improved backward stability. Routines are also provided to equilibrate the system, estimate the condition number, calculate the relative backward error, and estimate error bounds for the refined solutions.

8.87  SWOM

http://geosci.uchicago.edu/ cdieterich/swom/

The Shallow Water Ocean Model (swom) is designed as a toolbox to develop Arakawa A through D grid numerics for the shallow water equations.

8.88  Towhee

http://www.cs.sandia.gov/projects/towhee/

A Monte Carlo molecular simulation code originally designed for the prediction of fluid phase equilibria using atom-based force fields and the Gibbs ensemble with particular attention paid to algorithms addressing molecule conformation sampling. The code has subsequently been extended to several ensembles, many different force fields, and solid (or porous) phases.

8.89  Trilinos

http://software.sandia.gov/trilinos/index.html

The Trilinos Project is an effort to develop parallel solver algorithms and libraries within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific applications.

8.89.1  AztecOO

http://software.sandia.gov/trilinos/packages.html

Preconditioned Krylov solver package.

8.89.2  Epetra

http://software.sandia.gov/trilinos/packages.html

The core linear algebra package, containing code for the construction and manipulation of distributed and serial graphs, sparse and dense matrices, vectors and multivectors.

8.89.3  IFPACK

http://software.sandia.gov/trilinos/packages/ifpack/

A suite of object-oriented algebraic procedures for the solution of preconditioned iterative solvers.

8.89.4  ML

http://software.sandia.gov/trilinos/packages.html

A package of multilevel, distributed memory algebraic preconditioners that provides multilevel, multigrid-like preconditioners for distributed linear systems.

8.89.5  TriUtils

http://software.sandia.gov/trilinos/packages.html

A package of utilities used by most of the Trilinos packages.

8.90  TRLan

http://www.nersc.gov/research/SIMON/trlan.html

A program designed to find a small number of extreme eigenvalues and their corresponding eigenvectors of a real symmetric matrix.

8.91  UG

http://cox.iwr.uni-heidelberg.de/ ug/

UG is a flexible software tool for the numerical solution of partial differential equations on unstructured meshes in two and three space dimensions using multigrid methods. Its basic design is flexible enough to support many different discretization schemes. The underlying distributed dynamic data programming model offers a smooth migration from sequential to parallel computing.

8.92  UPC

http://upc.lbl.gov/

Unified Parallel C (UPC) is an extension of the C programming language designed for high performance computing on large-scale parallel machines.The language provides a uniform programming model for both shared and distributed memory hardware. The programmer is presented with a single shared, partitioned address space, where variables may be directly read and written by any processor, but each variable is physically associated with a single processor. UPC uses a Single Program Multiple Data (SPMD) model of computation in which the amount of parallelism is fixed at program startup time, typically with a single thread of execution per processor. MPI is one of several available network APIs over which UPC can be used.

Documentation:

User's Guide
http://upc.lbl.gov/docs/user/index.shtml
Tutorials
http://www.gwu.edu/ upc/tutorials.html

8.93  WAVEWATCH

http://polar.wwb.noaa.gov/waves/wavewatch/wavewatch.html

WAVEWATCH III is a third generation wave model developed at NOAA/NCEP in the spirit of the WAM model. WAVEWATCH III solves the spectral action density balance equation for wavenumber-direction spectra. The implicit assumption of this equation is that properties of medium (water depth and current) as well as the wave field itself vary on time and space scales that are much larger than the variation scales of a single wave. A further constratint is that the parameterizations of physical processes included in the model do not address conditions where the waves are strongly depth-limited. These two basic assumption imply that the model can generally by applied on spatial scales (grid increments) larger than 1 to 10 km, and outside the surf zone.

8.94  WRF

http://www.wrf-model.org/

A multi-institution effort to develop a next-generation national weather forecast model.

8.95  WSMP

http://www-users.cs.umn.edu/ agupta/wsmp.html

Watson Sparse Matrix Package (WSMP) is a collection of algorithms for efficiently solving large systems of linear equations whose coefficient matrices are sparse. This high-performance, robust, and easy-to-use software can be used as a serial package, or in a shared-memory multiprocessor environment, or as a scalable parallel solver in a message-passing environment, where each node can either be a uniprocessor or a shared-memory multiprocessor.

8.96  Zoltan

http://www.cs.sandia.gov/Zoltan/

The Zoltan Library provides critical data-management services to a wide range of parallel applications. Zoltan includes many utilities needed by unstructured and/or adaptive parallel applications. Zoltan's object-oriented interface is easy-to-use and enables Zoltan to be used by a number of different applications. Zoltan is designed to be flexible and extensible, so different algorithms can be used, compared and added easily.

8.97  ZPL

http://www.cs.washington.edu/research/zpl/index.html

ZPL is an array programming language designed from first principles for fast execution on both sequential and parallel computers. It provides a convenient high-level programming medium for supercomputers and large-scale clusters with efficiency comparable to hand-coded message passing.
ZPL is a new array programming language designed for engineering and scientific programs that would previously have been written in C or C++. Because its design goals were machine independence and high performance, ZPL programs run fast on both sequential and parallel computers. Because it is ïmplicitly parallel," i.e. the programmer does NOT express the parallelism, ZPL programs are simple and easy to write.

Chapter 9
Miscellaneous Odd Jobs

This section outlines various hardware or software problems we've encountered and possible solutions or solution procedures for them. Remember at all times that the devil is indeed in the details. All general procedures inevitably involve details that change from platform to platform. A little trial and error will usually get you through.

9.1  RPM Source Packages

It's easy enough to deal with RPM binary packages. You just install them with a command resembling rpm -ivh thepackage-0.5.6.rpm. Working with source RPM packages - in particular building a binary RPM from a source RPM - is a bit trickier. Bruce Barlock wrote a quick primer on how to do this which can be found at:

http://www.aplawrence.com/Linux/rpm-bg.html

We'll extract the essential elements here.
First, we'll assume you've obtained a source RPM file, say, samba-2.0.8-1.src.rpm. This needs to be installed to the /usr/src/redhat tree for further processing. This tree is simply a subdirectory - usually in a standard location - that contains a set of subdirectories containing all the code and metacode needed to work with RPM files. A listing of a typical /usr/src/redhat tree will look a bit like this:

drwxr-xr-x    7 root     root         4096 May 19 04:23 .
drwxr-xr-x    6 root     root         4096 Jun  4 20:20 ..
drwxr-xr-x    5 root     root         4096 Jun  6 21:08 BUILD
drwxr-xr-x    8 root     root         4096 May 19 04:23 RPMS
drwxr-xr-x    2 root     root         4096 Jun  4 19:46 SOURCES
drwxr-xr-x    2 root     root         4096 Jun  6 21:07 SPECS
drwxr-xr-x    2 root     root         4096 Apr  8 18:43 SRPMS

Install the source file to this tree with the command:

rpm -ivh samba-2.0.8-1.src.rpm

Now move to the /usr/src/redhat/SPECS subdirectory where you'll find a samba.spec file. Build the binary by running the command:

rpmbuild -bb samba.spec

This will build a binary RPM and place it in in /usr/src/redhat/RPMS/i386 directory. Note that in Barlock's documentation this command is given as rpm -bb, while here it is rpmbuild -bb. Apparently the build functions were separated from rpm into rpmbuild at some point since 2001. Actually, the i386 could be i486, i586, i686 or something else depending on the specifics in the samba.spec file, but the binary will be under one of them. Just find it and then do whatever you wanted to do with it, such as putting it in the appropriate place to add a package to your compute nodes.
This procedure assumes that you do not wish to make any changes to the compile process, that is, that you'll take the binary produced by the given source package as-is. If you want to change something in the compile process - for instance, add another configure or compile option - you'll need to edit the samba.spec file, but that's beyond our scope here. For more complicated tasks, either try a web search or hit the RPM home page at:

http://www.rpm.org/

9.2  Flashing the BIOS

You may someday be faced with a knotty hardware problem that can only be cleared up via installing a newer version of the BIOS. For those not wishing to deal with the Windows platform, Davor Ocelic has written a guide on how to flash the BIOS from the Linux platform.
First, you need a DOS boot image since all known flashing utilities work only under DOS. You can search such boot images using the obvious terms, or use one supplied by Davor called win98-boot.img available at:

http://colt.projectgamma.com/bios/flashing.html

You will need to add the flashing ROM and utilities for your specific motherboard to this image. You can almost always find them at the manufacturer's web site with a little poking around. Download the utilities you find, for example AWFL822A.EXE and W6330VMS.360, and add them to the win98-boot.img file via:
mkdir tmp
mount -o loop -t vfat win98-boot.img tmp
cp AWFL822A.EXE W6330VMS.360 tmp/
umount tmp

and then create a boot floppy of your completed win98-boot.img via:
dd if=win98-boot.img of=/dev/fd0

Insert the boot floppy into your floppy drive and reboot your machine. If an operating system is already installed on your hard drive, and your floppy drive has not been set to appear before the hard drive in the boot order, you need to change this during the reboot. This is usually accomplished by hitting the DELETE key right after the initiation of the boot process. This will bring the BIOS onscreen, after which you can poke around and figure out how to reset the boot order.
After you've changed the boot order, if this was needed, reboot the machine again. This should - if everything's been done correctly and the floppy isn't faulty and the floppy drive isn't faulty (increasingly real probabilities, by the way) - bring up a DOS prompt. After you get the DOS prompt, follow whatever specific instructions are given for your particular BIOS. Usually this means executing one of the programs you placed on the floppy.
For those without a floppy drive, Davor also gives instructions on how to flash using a CD-ROM. This will be a bit trickier, though, since his boot image doesn't contain CD-ROM drivers. You will either have to locate them separately, and add them to his boot image via the process detailed above, or find another image that contains the drivers.

Chapter 10
Miscellaneous Documentation

10.1  Anaconda

The following Anaconda documentation was found in /usr/share/doc/anaconda-*.*. Additional information can be found at:

http://rhlinux.redhat.com/anaconda/

10.1.1  Overview

Anaconda is the name of the install program used by Red Hat Linux. It is python-based with some custom modules written in C. Being written in a scripting language makes development quicker, and it is easier to distribute updates in a non-binary form. The anaconda installer works on a wide variety of Linux-based computing architectures (ia32, Itanium, Alpha, S/390, PowerPC), and is designed to make it easy to add platforms.
The first stage of the installer is a loader program written in C. This program is responsible for loading all the kernel modules required to mount the second stage of the installer, which has a fairly complete Linux runtime environment. The loader is designed to be small to fit within the constraints of bootable media (floppies are small by modern standards). Once the loader has mounted the second stage image, the python installer is started up, and optionally, a graphical X Windows based environment.
The loader can install from local media (harddrive or CDROM), or from a network source, via FTP, HTTP, or NFS. The installer can pull updates for bugs or features via several sources as well. Finally, the installer has an auto-install mechanism called kickstart that allows installs to be scripted. The script can even be pulls from an HTTP source that can create kickstart configurations dynamically based on the machine which is requesting the script. This allows endless possibilities in automating large sets of servers.
This document's purpose is to go over technical details that will make using and customizing the installer, and the distribution, much easier. The anaconda installer arguably is one of the most flexible and powerful installers available, and hopefully this document will allow users to take advantage of this potential.

10.1.2  Install Mechanism Summary

The document 'install-methods.txt', which is distributed with the anaconda package, goes over the various ways the installer can be used. Essentially, the installer needs to access the contents of the CD images distributed with the product. The installer can either work with the CD images one at a time, or else from a single directory (the install 'tree') which has the contents of all the CD images copied into it. The later is useful if you are customizing the packages in the distribution. The first stage of the installation process (the 'loader') is responsible for getting the system to the point it can access the installation source, whether CD image or installation tree based.
For CDROM-based installs the loader detects the presence of a CD in a drive in the system with a distribution on it and jumps straight to the second stage. For other interactive (non-kickstart) installation methods the user is prompted for the installation source. For kickstart-based installs the installation source is specified in the kickstart file, and the user is not required to be present unless necessary information is missing from the kickstart script.
For NFS-based installs the installer mounts the directory specified and looks for a set of ISO images, or an installation tree. If present then a filesystem image is loopback-mounted and the second stage installer is run from this image. For FTP and HTTP installs a smaller (no graphical install options) second stage image is downloaded into memory, mounted, and the second stage installer run from this. On harddrive based installs a similar small second stage image is put into memory and the second stage installer run from it. This is necessary because for partitioning to suceed the installer can not have partitions on the harddrive mounted in order for the kernel to be able to acknowledge partition table changes.
The bootable installation images are as follow: The supplemental driver disk images are:

10.1.3  Patching the Installer

At times there are bugfixes or feature enhancements available for the installer. These are typically replacement python source files which override the versions distributed with the release. Python has a mechanism similar to the command line shell search path for executables. The installer can be updated by putting patched files in a location earlier in the search path Python uses to find modules. The 'install-methods.txt' document describes all the various ways the installer can be told where to find the updating source files. Typcially this is done from an 'update disk', which is a floppy with an ext2 filesytem on it. The updated python source files are put in the main directory of the floppy. The installer is invoked with an 'updates' option from the boot command line, and the user is prompted to insert the update disk. The files are copied off into a ramdisk location which Python has been instructed to look at first of modules. For NFS installes, any files in the directory 'RHupdates' under the directory mounted in the loader will also be used before the source files shipped in the release. If one is customizing the distribution and the installer then installing over NFS is the fastest way to work.
The installer will also use an 'updates.img' file to get patched source files. This is particularly useful for FTP and HTTP based installs. When the second stage image is retrieved from the server, a download of the updates.img is also attempted. This file must be an ext2 filesystem image. It is mounted loopback, then the contents are copied to the ramdisk location that Python is setup to look at for module updates. This update image will also work with all the other installation mechanisms, although the exact location where it is expected does vary. The 'install-methods.txt' file has the details on this.

10.1.4  Invocation Options

These are the boot time command-line arguments:

10.1.5  Further Information

More fascinating tidbits about Anaconda can be found at:

Anaconda Development List
https://listman.redhat.com/mailman/listinfo/anaconda-devel-list

Kickstart List
https://listman.redhat.com/mailman/listinfo/kickstart-list

10.2  Kickstart

The Anaconda documentation in /usr/share/doc/anaconda-*.* includes the following copyright notice.
Copyright (c) 2003 by Red Hat, Inc.
Copyright TM 2003 by Red Hat, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, V1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).
Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.
Distribution of the work or derivative of the work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder.
Red Hat, Red Hat Network, the Red Hat "Shadow Man" logo, RPM, Maximum RPM, the RPM logo, Linux Library, PowerTools, Linux Undercover, RHmember, RHmember More, Rough Cuts, Rawhide and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds.

10.2.1  Introduction

What are Kickstart Installations?

Many system administrators would prefer to use an automated installation method to install Red Hat Enterprise Linux on their machines. To answer this need, Red Hat created the kickstart installation method. Using kickstart, a system administrator can create a single file containing the answers to all the questions that would normally be asked during a typical installation.
Kickstart files can be kept on single server system and read by individual computers during the installation. This installation method can support the use of a single kickstart file to install Red Hat Enterprise Linux on multiple machines, making it ideal for network and system administrators.
Kickstart provides a way for users to automate a Red Hat Enterprise Linux installation.

How Do You Perform a Kickstart Installation

Kickstart installations can be performed using a local CD-ROM, a local hard drive, or via NFS, FTP, or HTTP.
To use kickstart, you must:
  1. Create a kickstart file.
  2. Create a boot diskette with the kickstart file or make the kickstart file available on the network.
  3. Make the installation tree available.
  4. Start the kickstart installation.
This chapter explains these steps in detail.

Creating the Kickstart File

The kickstart file is a simple text file, containing a list of items, each identified by a keyword. You can create it by editing a copy of the sample.ks file found in the RH-DOCS directory of the Red Hat Enterprise Linux Documentation CD, using the Kickstart Configurator application, or writing it from scratch. The Red Hat Enterprise Linux installation program also creates a sample kickstart file based on the options that you selected during installation. It is written to the file /root/anaconda-ks.cfg. You should be able to edit it with any text editor or word processor that can save files as ASCII text.
First, be aware of the following issues when you are creating your kickstart file:

10.2.2  Kickstart Options

The following options can be placed in a kickstart file. If you prefer to use a graphical interface for creating your kickstart file, you can use the Kickstart Configurator application.
Note: If the option is followed by an equals mark (=), a value must be specified after it. In the example commands, options in brackets ([]) are optional arguments for the command.

10.2.3  Package Selection

Use the %packages command to begin a kickstart file section that lists the packages you would like to install (this is for installations only, as package selection during upgrades is not supported).
Packages can be specified by group or by individual package name. The installation program defines several groups that contain related packages. Refer to the RedHat/base/comps.xml file on the first Red Hat Enterprise Linux CD-ROM for a list of groups. Each group has an id, user visibility value, name, description, and package list. In the package list, the packages marked as mandatory are always installed if the group is selected, the packages marked default are selected by default if the group is selected, and the packages marked optional must be specifically selected even if the group is selected to be installed.
In most cases, it is only necessary to list the desired groups and not individual packages. Note that the Core and Base groups are always selected by default, so it is not necessary to specify them in the %packages section.
Here is an example %packages selection:
   %packages                                                                  
   @ X Window System                                                          
   @ GNOME Desktop Environment                                                
   @ Graphical Internet                                                       
   @ Sound and Video                                                          
   dhcp                                                                       
   
As you can see, groups are specified, one to a line, starting with an @ symbol, a space, and then the full group name as given in the comps.xml file. Groups can also be specified using the id for the group, such as gnome-desktop. Specify individual packages with no additional characters (the dhcp line in the example above is an individual package).
You can also specify which packages not to install from the default package list:
-autofs
The following options are available for the %packages option:

10.2.4  Pre-installation Script

You can add commands to run on the system immediately after the ks.cfg has been parsed. This section must be at the end of the kickstart file (after the commands) and must start with the %pre command. You can access the network in the %pre section; however, name service has not been configured at this point, so only IP addresses will work.
Note: The pre-install script is not run in the change root environment.

Example

Here is an example %pre section:
%pre                                                                                
                                                                                    
#!/bin/sh                                                                           
                                                                                    
hds=""                                                                              
mymedia=""                                                                          
                                                                                    
for file in /proc/ide/h*                                                            
do                                                                                  
  mymedia=`cat $file/media`                                                         
  if [ $mymedia == "disk" ] ; then                                                  
      hds="$hds `basename $file`"                                                   
  fi                                                                                
done                                                                                
                                                                                    
set $hds                                                                            
numhd=`echo $#`                                                                     
                                                                                    
drive1=`echo $hds | cut -d' ' -f1`                                                  
drive2=`echo $hds | cut -d' ' -f2`                                                  
                                                                                    
#Write out partition scheme based on whether there are 1 or 2 hard drives           
                                                                                    
if [ $numhd == "2" ] ; then                                                         
  #2 drives                                                                         
  echo "#partitioning scheme generated in %pre for 2 drives" > /tmp/part-include    
  echo "clearpart --all" >> /tmp/part-include                                       
  echo "part /boot --fstype ext3 --size 75 --ondisk hda" >> /tmp/part-include       
  echo "part / --fstype ext3 --size 1 --grow --ondisk hda" >> /tmp/part-include     
  echo "part swap --recommended --ondisk $drive1" >> /tmp/part-include              
  echo "part /home --fstype ext3 --size 1 --grow --ondisk hdb" >> /tmp/part-include 
else                                                                                
  #1 drive                                                                          
  echo "#partitioning scheme generated in %pre for 1 drive" > /tmp/part-include     
  echo "clearpart --all" >> /tmp/part-include                                       
  echo "part /boot --fstype ext3 --size 75" >> /tmp/part-includ                     
  echo "part swap --recommended" >> /tmp/part-include                               
  echo "part / --fstype ext3 --size 2048" >> /tmp/part-include                      
  echo "part /home --fstype ext3 --size 2048 --grow" >> /tmp/part-include           
fi                                                                                 

This script determines the number of hard drives in the system and writes a text file with a different partitioning scheme depending on whether it has one or two drives. Instead of having a set of partitioning commands in the kickstart file, include the line:

%include /tmp/part-include

The partitioning commands selected in the script will be used.

10.2.5  Post-installation Script

You have the option of adding commands to run on the system once the installation is complete. This section must be at the end of the kickstart file and must start with the %post command. This section is useful for functions such as installing additional software and configuring an additional nameserver.
Note: If you configured the network with static IP information, including a nameserver, you can access the network and resolve IP addresses in the %post section. If you configured the network for DHCP, the /etc/resolv.conf file has not been completed when the installation executes the %post section. You can access the network, but you can not resolve IP addresses. Thus, if you are using DHCP, you must specify IP addresses in the %post section.
Note: The post-install script is run in a chroot environment; therefore, performing tasks such as copying scripts or RPMs from the installation media will not work.

Examples

Turn services on and off:
/sbin/chkconfig --level 345 telnet off                                     
/sbin/chkconfig --level 345 finger off                                     
/sbin/chkconfig --level 345 lpd off                                        
/sbin/chkconfig --level 345 httpd on                                       

Run a script named runme from an NFS share:
mkdir /mnt/temp                                                            
mount 10.10.0.2:/usr/new-machines /mnt/temp                                
open -s -w -- /mnt/temp/runme                                              
umount /mnt/temp

Add a user to the system:
/usr/sbin/useradd bob                                                      
/usr/bin/chfn -f "Bob Smith" bob                                           
/usr/sbin/usermod -p 'kjdf$04930FTH/ ' bob  

10.2.6  Making the Kickstart File Available

A kickstart file must be placed in one of the following locations:
Normally a kickstart file is copied to the boot diskette, or made available on the network. The network-based approach is most commonly used, as most kickstart installations tend to be performed on networked computers.
Let us take a more in-depth look at where the kickstart file may be placed.

Creating a Kickstart Boot Diskette

To perform a diskette-based kickstart installation, the kickstart file must be named ks.cfg and must be located in the boot diskette's top-level directory. Refer to the section Making an Installation Boot Diskette in the Red Hat Enterprise Linux Installation Guide for instruction on creating a boot diskette. Because the boot diskettes are in MS-DOS format, it is easy to copy the kickstart file under Linux using the mcopy command:

mcopy ks.cfg a:

Alternatively, you can use Windows to copy the file. You can also mount the MS-DOS boot diskette in Red Hat Enterprise Linux with the file system type vfat and use the cp command to copy the file on the diskette.

Creating a Kickstart Boot CD-ROM

To perform a CD-ROM-based kickstart installation, the kickstart file must be named ks.cfg and must be located in the boot CD-ROM's top-level directory. Since a CD-ROM is read-only, the file must be added to the directory used to create the image that is written to the CD-ROM. Refer to the Making an Installation Boot CD-ROM section in the Red Hat Enterprise Linux Installation Guide for instruction on creating a boot CD-ROM; however, before making the file.iso image file, copy the ks.cfg kickstart file to the isolinux/ directory.

Making the Kickstart File Available on the Network

Network installations using kickstart are quite common, because system administrators can easily automate the installation on many networked computers quickly and painlessly. In general, the approach most commonly used is for the administrator to have both a BOOTP/DHCP server and an NFS server on the local network. The BOOTP/DHCP server is used to give the client system its networking information, while the actual files used during the installation are served by the NFS server. Often, these two servers run on the same physical machine, but they are not required to.
To perform a network-based kickstart installation, you must have a BOOTP/DHCP server on your network, and it must include configuration information for the machine on which you are attempting to install Red Hat Enterprise Linux. The BOOTP/DHCP server will provide the client with its networking information as well as the location of the kickstart file.
If a kickstart file is specified by the BOOTP/DHCP server, the client system will attempt an NFS mount of the file's path, and will copy the specified file to the client, using it as the kickstart file. The exact settings required vary depending on the BOOTP/DHCP server you use.
Here is an example of a line from the dhcpd.conf file for the DHCP server:
filename "/usr/new-machine/kickstart/";                                    
next-server blarg.redhat.com;                                              

Note that you should replace the value after filename with the name of the kickstart file (or the directory in which the kickstart file resides) and the value after next-server with the NFS server name.
If the filename returned by the BOOTP/DHCP server ends with a slash ("/"), then it is interpreted as a path only. In this case, the client system mounts that path using NFS, and searches for a particular file. The filename the client searches for is:

<ip-addr>-kickstart

The <ip-addr> section of the filename should be replaced with the client's IP address in dotted decimal notation. For example, the filename for a computer with an IP address of 10.10.0.1 would be 10.10.0.1-kickstart.
Note that if you do not specify a server name, then the client system will attempt to use the server that answered the BOOTP/DHCP request as its NFS server. If you do not specify a path or filename, the client system will try to mount /kickstart from the BOOTP/DHCP server and will try to find the kickstart file using the same <ip-addr>-kickstart filename as described above.

Making the Installation Tree Available

The kickstart installation needs to access an installation tree. An installation tree is a copy of the binary Red Hat Enterprise Linux CD-ROMs with the same directory structure.
If you are performing a CD-based installation, insert the Red Hat Enterprise Linux CD-ROM #1 into the computer before starting the kickstart installation.
If you are performing a hard-drive installation, make sure the ISO images of the binary Red Hat Enterprise Linux CD-ROMs are on a hard drive in the computer.
If you are performing a network-based (NFS, FTP, or HTTP) installation, you must make the installation tree available over the network. Refer to the Preparing for a Network Installation section of the Red Hat Enterprise Linux Installation Guide for details.

Starating a Kickstart Installation

To begin a kickstart installation, you must boot the system from a Red Hat Enterprise Linux boot diskette, Red Hat Enterprise Linux boot CD-ROM, or the Red Hat Enterprise Linux CD-ROM #1 and enter a special boot command at the boot prompt. The installation program looks for a kickstart file if the ks command line argument is passed to the kernel.
Other options to start a kickstart installation are as follows:



File translated from TEX by TTM, version 3.55.
On 23 Mar 2004, 15:51.