Thursday, June 10, 2004
G. W. Stewart has created Matran, a matrix wrapper for Fortran 95, to make it easier to do matrix computations within Fortran 95.
I would like to announce the availability of Matran (pronounced
MAY-tran), a Fortran 95 wrapper that implements matrix operations and
computes matrix decompositions using Lapack and the Blas. Although
Matran is not based on a formally defined matrix language, it provides
the flavor and convenience of coding in matrix oriented systems like
Matlab, Octave, etc. By using routines from Lapack and the Blas,
Matran allows the user to obtain the computational benefits of these
packages with minimal fuss and bother.
Matran has the following features.
- Matran supports computation with general, diagonal, triangular,
Hermitian, and positive definite matrices. Empty matrices are
- Matran uses overloaded and defined operators to implement
the common matrix operations, including manipulations with
partitioned matrices. In addition, Matran provides functions
and constructors for norms, random matrices, etc.
- Matran supports the following matrix decompositions: Pivoted
LU, Cholesky, QR, pivoted QR, spectral, SVD, real Schur,
- Matran allocates memory automatically, attempting to reuse
memory whenever possible. It provides a systematic way for
deallocating memory when it must be done by the programmer.
- Matran is an open package. It is easy to add new capabilities.
A big problem with current MPI implementations is their lack of fault tolerance. If one of the computing nodes goes down, any parallel MPI jobs using that node will also crash.
The FT-MPI project aims to remedy that situation.
Fault Tolerant MPI (FT-MPI) is an independent implementation of the MPI 1.2
message passing standard that has been built from the ground up offering
both user and system level fault tolerance. The FT-MPI library gives
application developers the ability to build fault tolerant or survivable
applications that do not immediately exit due to the failure of a processor,
node, or MPI task. To reach this goal, FT-MPI extends the current MPI
specification, playing a leading role in current world-wide efforts to
improve the error-handling of MPI applications.
FT-MPI is an efficient MPI implementation of the MPI standard and its
performance is comparable to other public MPI implementations. This has been
achieved though the use of optimized data type handling, an efficient point
to point communications progress engine and highly tuned and configurable
The first full FT-MPI release was announced at the SC2003 Conference in
Phoenix November 2003. This release included all the functions defined in
the MPI 1.2 document as well as several sections of the MPI 2
specifications. The release was validated by the IBM, Intel, PACX, MPICH,
BLACS and ScaLAPACK test suites. Performance comparisons have been made
using the HPC Challenge Benchmarks (http://icl.cs.utk.edu/hpcc/) . A number
of fault survivable numeric applications have also been developed and are
distributed as user modifiable examples.
I've found another freely available MATLAB-esque package. This one's from the functional language community. It's called PsiLAB and is based on the
PsiLAB is written mainly in the functional language O'CaML developed at INRIA research laboratories. It's mainly made of three parts:
- An interpreter, of course O'CaML itself
- libraries written in O'CaML,
- external libraries written in Fortran and C.
Main features of PsiLAB are:
- All O'CaML functions and data types are supported,
- support for different data types: float, int, complex
- extensive matrix package
- 2D and 3D plot package with graphical or postscript output
- various generic and special mathematical functions
- linear algebra package (solving of linear equation systems and linear least square problems)
- Linear Regression
- non linear least square fit routines
- Fast Fourier Transformations
- some image processing functions
- online help system, easily extensible by user functions
- easy to extend for people knowing basics about the O'CaML C extension facilities
PsiLAB uses the following external libraries, mainly written in Fortran:
- LAPACK: Linear algebra and linear least square problems
- MINPACK: Non linear least square fits
- PLPLOT: 2D and 3D plot library with several output drivers (X11, PS, Xfig,...)
- FFTW: Fastest Fourier Transform in the West (and the East ?)
- AMOS: Several special functions: Bessel Polynomials and more ...
- SLATEC (partially implemented): More special functions (Gamma function,...)
- CamlImages (partially implemented): Support for various image formats
PsiLAB is not only written in O'CaML, it is CaML. That means: if you are familar with this programming language, you can write PsiLAB programs. And you can do all things with PsiLAB you can do with the generic O'CaML development system:
- using modules for access to data base servers
- creating new develop environments
- writing lexers and parsers (perhaps with mathematical background)
- more sophisticated image processing
- http servers (with direct access to your computation results ?)
- and many more ...
The CaML interpreter system, which is in reality a pure compiler concept, was chosen because of the high computation speed of this system and the high portability. You have the advantages of an interpreter like language (from the user point of view), but with performance comparable with C/C++ programs. All functions will be translated by the CaML compiler into a system and machine independent Byte Code. This Byte Code will be then executed on a virtual machine. Currently, you have a terminal driven environement with online help. Plots are printed to an additional X11 window or to a postscript file.
A new version of PLTMG is available.
PLTMG 9.0 is a package for solving elliptic partial differential
equations in general regions of the plane. It is based on continuous
piecewise linear triangular finite elements. PLTMG features several
adaptive meshing options and an algebraic multilevel solver for the
resulting systems of linear equations. PLTMG provides a suite of
continuation options to handle PDEs with parameter dependencies.
It also provides options for solving several classes of optimal
control and obstacle problems. The package includes an initial
mesh generator and several graphics packages. Support for the
Bank-Holst parallel adaptive meshing paradigm is also provided.
PLTMG is provided as Fortran (and a little C) source code, in both
single and double precision versions. The code has interfaces to
X-Windows, MPI, and Michael Holst's OpenGL display tool SG. The
X-Windows, MPI, and SG interfaces require libraries that are NOT
provided as part of the PLTMG package.
The Harminv library for harmonic inversion has been released.
Harminv is a free program (and accompanying library) to solve the problem of harmonic inversion — given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially decaying) in a given bandwidth, it determines the frequencies, decay constants, amplitudes, and phases of those sinusoids.
It can, in principle, provide much better accuracy than straightforwardly extracting FFT peaks, essentially because it assumes a specific form for the signal. (Fourier transforms, in contrast, attempt to represent any data as a sum of sinusoidal components, and are thus limited by the uncertainty principle.) It is also often more robust than directly least-squares fitting the data (which can have problematic convergence), since it re-expresses the problem in terms of simply finding the eigenvalues of a small matrix.
Harminv uses a low-storage "filter diagonalization method" (FDM) for finding the sinusoids near a given frequency interval.
This kind of spectral analysis has wide applications in many areas of physics and engineering, as well as other fields. For example, it could be used to extract the vibrational or "eigen" modes of a system from its response to some stimulus, and also their rates of decay in dissipative systems. FDM has been applied to analyze, e.g., NMR experimental data. It is especially appropriate for analyzing numerical simulations, e.g. of quantum mechanics or classical electromagnetism. In general, it is useful when you know on physical grounds that your system consists of a small number of decaying & oscillating modes in the bandwidth of interest, and is not appropriate to analyze more arbitrary waveforms.
Version 5.0 of deal.II,
a finite element Differential Equations Analysis Library, has been released.
deal.II is a C++ program library targeted at adaptive finite elements and error estimation. It uses state-of-the-art programming techniques of the C++ programming language to offer you a modern interface to the complex data structures and algorithms required for adaptivity and enables you to use a variety of finite elements in one, two, and three space dimensions, as well as time-dependent problems.
The main aim of deal.II is to enable development of modern finite element algorithms, using among other aspects sophisticated error estimators and adaptive meshes. Writing such programs is a non-trivial task, and successful programs tend to become very large and complex. We therefore believe that this is best done using a program library that frees the application programmer from aspects like grid handling and refinement, handling of degrees of freedom, input of meshes and output of results in graphics formats, and the like. Also, support for several space dimensions at once is included in a way such that programs can be written independent of the space dimension without unreasonable penalties on run-time and memory consumption.
Among other things, it offers:
- Support for one, two, and three space dimensions, using a unified interface that allows to write programs almost dimension independent.
- Handling of locally refined grids, including different adaptive refinement strategies based on local error indicators and error estimators.
- Support for a variety of finite elements, including Lagrange elements of order one through four, discontinuous elements, Nedelec elements, and elements composed of other elements.
- Extensive documentation: all documentation is available online in a logical tree structure to allow fast access to the information you need. If printed it comprises more than 400 pages of tutorials, several reports, and presently some 3,800 pages of programming interface documentation with explanations of all classes, functions, and variables. All documentation comes with the library and is available online locally on your computer after installation.
- Modern software techniques that make access to the complex data structures and algorithms as transparent as possible. The use of object oriented programming allows for program structures similar to the structures in mathematical analysis.
- Fast algorithms that enable you to solve problems with up to several millions of degrees of freedom quickly. As opposed to programming symbolic algebra packages the penalty for readability is low.
- A complete stand-alone linear algebra library including sparse matrices, vectors, Krylov subspace solvers, support for blocked systems, and interface to other packages such as PETSc and METIS.
- Support for several output formats, including many common formats for visualization of scientific data.
LAPACK AND SCALAPACK TO BE UPDATED
Jack Dongarra et al. are in the process of updating the venerable LAPACK and
ScaLAPACK packages and want feedback from the user community.
We plan to update the LAPACK and ScaLAPACK libraries and would like to have
feedback from users on what functionalities they think are missing and would
be needed in order to make these libraries more useful for the community. We
invite you to enter your suggestions in the form below. It would be most useful
to have input by June 16th, although we would welcome your input at any time.
Both LAPACK and ScaLAPACK provide well-tested, open source, reviewed code
implementing trusted algorithms that guarantee reliability, efficiency and
accuracy. Any new functionality must adhere to these standards and should
have a significant impact in order to justify the development costs. We are
also interested in suggestions regarding user interfaces, documentation,
language interfaces, target (parallel) architectures and other issues, again
provided the impact is large enough.
We already plan to include a variety of improved algorithms discovered over
the years by a number of researchers (e.g. faster or more accurate
eigenvalue and SVD algorithms, extra precise iterative refinement, recursive
blocking for some linear solvers, etc.). We also know of a variety of other
possible functions we could add (e.g. updating and downdating
factorizations), but are uncertain of their impact.
Please see http://icl.cs.utk.edu/lapack-survey.html for the survey.
We would like to have your input by June 16th, 2004.
AXIOM NOW AVAILABLE
A while back I mentioned that the source code for the former commercial symbolic math package Axiom was going to be made available in the near future. Well, we're there, and
here it is. Additional documentation can be found
Axiom is a general purpose Computer Algebra system. It is useful for research and development of mathematical algorithms. It defines a strongly typed, mathematically correct type hierarchy. It has a programming language and a built-in compiler.
Axiom has been in development since 1971. At that time, it was called Scratchpad. Scratchpad was a large, general purpose computer algebra system that was originally developed by IBM under the direction of Richard Jenks. The project started in 1971 and evolved slowly. Barry Trager was key to the technical direction of the project. Scratchpad developed over a 20 year stretch and was basically considered as a research platform for developing new ideas in computational mathematics. In the 1990s, as IBM's fortunes slid, the Scratchpad project was renamed to Axiom, sold to the Numerical Algorithms Group (NAG) in England and became a commercial system. As part of the Scratchpad project at IBM in Yorktown Tim Daly worked on all aspects of the system and eventually helped transfer the product to NAG. For a variety of reasons it never became a financial success and NAG withdrew it from the market in October, 2001.
NAG agreed to release Axiom as free software. The basic motivation was that Axiom represents something different from other programs in a lot of ways. Primarily because of its foundation in mathematics the Axiom system will potentially be useful 30 years from now. In its current state it represents about 30 years and 300 man-years of research work. To strive to keep such a large collection of knowledge alive seems a worthwhile goal.
Efforts are underway to extend this software to (a) develop a better user interface (b) make it useful as a teaching tool (c) develop an algebra server protocol (d) integrate additional mathematics (e) rebuild the algebra in a literate programming style (f) integrate logic programming (g) develop an Axiom Journal with refereed submissions.
The Axiom sources are now available for anonymous download. The system builds correctly on Redhat Linux 9.
For those who want to jump on the XML bandwagon but aren't willing to give up all their neat
TeX stuff, there's TeXML, an XML vocabulary for TeX. There's also a Python processor that translates TeXML source into TeX.
Thursday, April 15, 2004
CONTEXT AND MEMOIR
Although the LaTeX macros for the TeX typesetting system allow you
to quickly put together reports, books and the like, they just aren't very
flexible. A couple of alternatives with much more layout flexibility are
ConTeXt macro packages. Each is extensively documented via
hundreds of pages worth of manuals.
Tuesday, March 30, 2004
Algorithm Development and Mining system being developed at the University
of Alabama in Huntsville looks well-designed and useful.
ADaM...is used to apply data mining technologies to remotely-sensed and other scientific data. The mining and image processing toolkits consist of interoperable components that can be linked together in a variety of ways for application to diverse problem domains. ADaM has over 75 components that can be configured to create customized mining processes. Preprocessing and analysis utilities aid users in applying data mining to their specific problems. New components can easily be added to adapt the system to different science problems.
The components are grouped in several
ADaM 4.0 components are general purpose mining and image processing modules that can be easily reused for multiple solutions and disciplines. These components are well positioned to address the needs for distributed mining and image processing services in web and grid applications.
More on the component architecture:
- classification techniques
- clustering techniques
- pattern recognition utilities
- association rules
- optimization techniques
- basic image processing operations
- segementation/edge and shape detection
- texture features
ADaM's component architecture is designed to take advantage of emerging computational environments such as the Web and information Grids. Individual ADaM operations can execute in a stand-alone fashion, facilitating their use in distributed and parallel processing systems. The operations - organized as toolkits - provide pattern recognition, image processing, optimization, and association rule mining capabilities. Components are packaged in several ways, including C/C++ libraries, executables, and Python modules. Multi-interface component packaging facilitates rapid prototyping and efficient, performance-critical data mining application development. This approach also facilitates the use of ADaM components by and with third-party analysis and visualization systems.
Monday, March 29, 2004
While looking for some way - any way - to increase the performance of our
computational cluster, I found the
DataSpace is a web services based infrastructure for exploring, analyzing, and mining remote and distributed data. This site describes DataSpace protocols, DataSpace applications, and open source DataSpace servers and clients.
The most interesting bit was the
SABUL protocol, although it is currently being superseded by
DataSpace applications employ a protocol for working with remote and distributed data called the DataSpace Transfer Protocol or DSTP. DSTP simplifies working with data by providing direct support for common operations, such as working with attributes, keys and metadata.
The DSTP protocol can be layered over specialized high performance transport protocols such as SABUL. Using protocols such as SABUL, DataSpace applications can effectively work on wide area high performance OC-3, OC-12 and Gbps networks. SABUL currently holds the landspeed record for connecting two distributed clusters, a record set at iGrid 02.
Starting in 2003, we began developing a new version of SABUL called UDP-based Data Transport Protocol or UDT, which uses UDP for both the control and data channel. An open challenge is to design protocols for high performance data transport so that they are friendly to both other flows using the same protocol (intra-protocol fairness) and to other flows employing different protocols, such as TCP (TCP friendliness). In both simulation and experimental studies using UDT, we have found UDT to be fair to both dozens of other UDT flows as well as friendly to hundreds of concurrent TCP flows.
Now all I have to do is figure out how frigging hard it would be to implement the UDT protocol in place
of TCP/IP on the cluster, and I've not yet begun to spin up sufficiently to appreciate the complexities
of the task.
HYPERSPECTRAL CHALLENGE PROBLEM
Reconfigurable Computing Systems folks at LANL are tackling
Hyperspectral Challenge Problem.
In the area of remote sensing, civilian, industrial, and military applications are becoming increasingly overwhelmed by the volume and complexity of imaging data now being collected. Airborne remote sensing was originally limited to black and white imagery, but now multi and hyperspectral image sensors are delivering datasets with dozens, hundereds, or thousands of spectral channels per image spatial pixel. The processing of these datasets in real time has become a very difficult problem. We provide here an overview (large!) and example data (very large) and C code for a variety of platforms. The data set provided here must be processed within 3 seconds to meet real time requirments for representative systems. Typical LINUX workstations have been clocked with this problem at a few minutes.
ISIS is a most interesting package for searching for patterns in images and signals,
the name of which brings back fond memories of lusting over the woman who
played the title character in a Saturday morning TV series with the same
name in the early 1970s.
Their take on the matter:
The Los Alamos ISIS program is developing a set of software packages and reconfigurable computing hardware to enable rapid exploration and analysis of images and signals. Packages in the ISIS software suite build customized, robust, automated algorithms for feature extraction and analysis. With current sensor platforms collecting a flood of high-quality data, automatic feature extraction (AFE) has become a key to enabling human analysts to keep up with the flow. The ISIS software packages produce AFE tools for features in multispectral, hyperspectral, panchromatic, and multi-instrument fused imagery. Both spectral and spatial signatures of image features are discovered and exploited. The software features an interactive graphical user interface, and a parallel/scalable processing backend that runs on off-the-shelf computers.
The Los Alamos ISIS software suite currently includes four "tool-maker" image/signal processing packages, as well as a common point-and-click graphical user interface (called Aladdin) for providing training data and running the tool-makers. The four tool-maker packages are:
- GENIE, which uses techniques from genetic programming to build customized spatio-spectral algorithms for a wide range of sensors (electro-optical, infrared, and other modalities; panchromatic through hyperspectral data). GENIE is designed to process imagery, and has also been applied to image-like signals (e.g., "waterfall" displays).
- POOKA, which combines reconfigurable computing hardware with evolutionary algorithms to allow rapid prototyping of image and signal processing algorithms implemented at the chip level. This enables POOKA to rapidly produce customized automated feature extraction algorithms the run hundreds of times faster than equivalent algorithms implemented in software. POOKA uses a commercially available reconfigurable computing board that plugs into standard Windows workstations.
- Afreet, which exploits recent advances in computational machine learning theory, combining adaptive spatio-spectral image processing with a powerful support vector machine (SVM) supervised classifier to process imagery and image-like signals.specializes in signal processing, using evolutionary computational techniques to build signal classification algorithms.
- Zeus, which specializes in signal processing, using evolutionary computational techniques to build signal classification algorithms. Zeus is the newest member of the ISIS suite of toolmakers.
Friday, March 26, 2004
Yes, it's time again for my bi-yearly attempt at reviving this weblog just
before it flatlines. While the previous post
is but a lagniappe, some real meat (or vegetables, if you prefer), can
be found in the thoroughly updated
Software for Graphics and Data Analysis and
Ocean Circulation Modeling Projects sections.
The former has been updated for the first time in over seven years,
and the latter has been spruced up after about three years.
While the former was compiled by an oceanographer with oceanographic
applications in mind, the vast majority of packages therein can be
applied to just about any scientific - or non-scientific, for that matter -
discipline. The latter, on the other hand...well, how much more discipline-specific
can you get than a list of numerical ocean circulation models?
Thursday, March 25, 2004
The MCS Group at Argonne Labs has come with an interesting suite of Unix administration tools called Msys. They are:
- whatami - We often need to be able to determine what type of "architecture" a given machine is, so that we can put the appropriate directories in a path or execute the correct program. No unix utility quite fulfills our requirements, so we've created the "whatami" program to return a single unique string on each architecture. This string is then used in directory names and in program switches.
- softenv - Softenv is a system used to build the user's environment. Each user has a ".soft" file in which they specify groups of applications that they're interested in. Softenv reads a central database when necessary to update the user's PATH, MANPATH and other variables.
- pkg - "Pkg" is designed to ease the management of a large number of GNU style software packages. In particular Pkg is designed to allow the installation of multiple versions of the same package (such as emacs) without creating conflicts. This is usually necessary in any kind of a complex environment where legacy applications have to be suppored for long periods of time. "Pkg" is a standalone set of scripts that don't depend on anything other then Perl 5. However to be truly useful "Pkg" should be combined with user environment management system such as "softenv"
- chex - Chex is a simple tools used to examine and respond to logfiles. Mostly, Chex is used in conjunction with the conserver packages for serial console log file scrutiny.
- sanity - Sanity is a tools that we use to maintain consistency across a large number of machines. By itself, sanity is a light weight tool that simply runs a series of modules (defined by the sys admin) that check various system settings such as installed software, config files, kernels, zipdrive devices etc. Sanity can be run by hand, by cron, by rc file or any combination thereof.
- gpd - A "General Purpose Daemon" written entirely in Perl. Key features include: services multiple server sockets, non-blocking for all I/O operations, can be renamed and run any number of times on a single machine to provide multiple services, can invoke internal or external (fork-exec) commands using Unix command invocation semantics: "program/subroutine < options/flags> < arguments>", has basic hostname/ip client authentication.
- netflowdb - Netflowdb is a collection of scripts that we have used for proof-of-concept capture of Cisco NetFlow records and storing them in SQL databases. These tools are referenced in the LISA 2000 paper Combining Cisco NetFlow Exports with Relational Database Technology for Usage Statistics, Intrusion Detection, and Network Forensics.
- amihappy - Amihappy is a monitoring package which can be run on one or many machines. The client script gathers information on a machine by running customizable modules based on its configuration file. It then sends this information through the use of XML-RPC to server scripts which parse the data and update the main database. The state of machines can then be viewed through a web browswer, either by failures or by each machine name.
Wednesday, June 11, 2003
Daily Python-URL announces some pedagogical nuggets as well as some nifty software.
DSP RADIO FOR LINUX
There's an interesting ongoing project to provide DSP radio for Linux on Intel platforms. The intro:
Modern computers have the processing power to outperform conventional radios in receiving signals with poor S/N. Particularly when the poor S/N is due to interference rather than to white (galactic) noise the computer can remove interference within the narrow bandwidth of the desired signal by use of the information about the interference source retrieved by use of larger bandwidths. The signal processing can be far more clever than what has been possible before. Each interference source can be treated as a signal and the DSP radio can receive AND SEPARATE a large number of signals simultaneously. The DSP radio package is under development with flexibility and generality as important aspects. This page contains links to pages that describe different aspects of digital radio processing in the order they are encountered in the ongoing development. The DSP-radio for LINUX is designed for all narrow band modulation methods for all frequency bands.
In addition to supplying the software, the site provides a fairly thorough overview/tutorial on digital signal processing and radio signals.
Friday, June 06, 2003
From NA-Digest, we learn about the FMM 2D Fast Multipole Methods Toolbox for Matlab. This is a 2-D toolbox, and from their remark that they'll be making "selected 3-D algorithms" available in the future, it looks like the full 3-D version will be a commercial product.
The FMM 2D Toolbox v1.0 for MATLAB is now available from MadMax Optics for free download. This Toolbox provides three kinds of solvers based on optimized Fast Multipole Methods (FMMs).
The PDE solvers are true black-box tools, based on FMM-accelerated integral equations using high-order quadrature and interpolation schemes. Solutions are obtained in O(N) time, where N is the number of points in the discretization. Boundaries are easily described as a sequence of points. Volume source data (for the Poisson and inhomogeneous modified Helmholtz equation) can be specified at a set of arbitrary locations, allowing the solver to be coupled with virtually any underlying data structure. No grid generation is required.
- Particle codes for 2D Coulombic and screened Coulombic interactions
- Solvers for homogeneous boundary value problems in interior or exterior multiply connected domains with Dirichlet or Neumann conditions: (Laplace and modified Helmholtz equations)
- Solvers for inhomogeneous boundary value problems in interior or exterior multiply connected domains with Dirichlet or Neumann conditions: (Poisson and modified Helmholtz equations)
In order to provide solvers with the maximum of flexibility, we have incorporated a number of novel features, including algorithms that can be applied to regions with corners and discontinuous boundary data. Thus, while the amount of work is of the order O(N), the solvers are optimized for flexibility rather than speed. The latter would require stricter control of the user's discretization schemes and more detailed assumptions about the user's data.
Via the NA-Digest we learn about SLEPc...
...a parallel software library for the solution of large sparse eigenvalue problems on parallel computers. It can be used for the solution of problems formulated in either standard or generalized form, both Hermitian and non-Hermitian, with either real or complex arithmetic.
SLEPc is built on top of PETSc, the Portable Extensible Toolkit for Scientific Computation. It can be considered an extension of PETSc providing all the functionality necessary for the solution of eigenvalue problems. SLEPc includes several eigensolvers as well as interfaces to other libraries such as ARPACK. It also provides built-in support for spectral transformations such as shift-and-invert.
Friday, December 06, 2002
Some good news (via Hack the Planet) about GForge, an open source version of SourceForge.
A key piece of open-source infrastructure is back, and not enough people have heard, yet. When VA Software Corp. took Tim Perdue's GPLed SourceForge project proprietary (or rather, took proprietary the only component it owned, the "alexandria" set of glue code), following the v. 2.5 release, development slowed for a number of reasons, including the multiplicity of open-source forks and VA Software's unfulfilled promise of a GPLed alexandria 2.7 release (announced for August 2000 release, but then silently dropped).
As of this past weekend, Tim Perdue is back, with GForge (http://gforge.org/), a greatly cleaned-up successor forked from VA Software's final beta, alexandria 2.61pre4. Tim has removed a great deal of unnecessary code and optimisations specific to sf.net (eliminating dependencies and simplifying installation), cleaned up the user interface, removed the little-used "foundry" feature, and added support for Jabber instant messaging. (It's important to note that gforge.org doesn't itself offer project hosting, but rather the software required to run hosting sites.)
Monday, December 02, 2002
Recent Freshmeat items of interest:
- CwMtx - A library that provides the matrix and vector operations that are used extensively in engineering and science problems. A special feature of this library is the quaternion class which implements quaternion math. Quaternions are very useful for attitude determination in 3D space because they do not suffer from singularities. Furthermore, successive rotations and transformations of vectors can be accomplished by simple quaternion multiplication.
- Astro::FITS::Header - Tools for reading, modifying and then writing out FITS standard header blocks to FITS and NDF files.
- wlog - A real-time signal analyzer using wavelets.
- duplicity - Encrypted bandwidth-efficient backup using the rsync algorithm.
- Buildtool - A portable build infrastructure that can be used in the development of any kind of program. Basically, it simplifies the build process of a program from user's point of view, by automatically configuring the source code with specific details of the host system; it also makes developer's work easier because all Makefile complexity is hidden and behavior is homogenized.
- Source Navigator - A source code analysis tool. With it, you can edit your source code, display relationships between classes and functions and members, and display call trees. You can also build your projects, either with your own makefile, or by using Source-Navigator's build system to automatically generate a makefile.
RELATIONAL MODELING OF BIOLOGICAL DATA
One of the many challenges facing the bioinformatics community is representing the hierarchical information structures they work with in relational databases. Aaron Mackey addresses this issue in Relational Modeling of Biological Data: Trees and Graphs. After reviewing publicly available biological data and projects, Mackey gets down to brass tacks:
While these projects have rich public Web sites for browsing and downloading the data, none of them provide the means to compute on the data, to relate your own data to theirs, or to query across the disparate resources. Fortunately, all of these data sources can be freely obtained by ftp or anonymous cvs for integration into a local, custom-made relational database. But how does one represent hierarchical data in a relational database? And, more importantly, how can one make efficient use of such data?
The AXIOM symbolic algebra system is no longer available as a commercial product from NAG. The good news is that NAG is apparently working on an open source release. Already available is Aldor, the standalone library compiler for AXIOM. The Axiom site hosts a mailing list for those interested in the project. It also contains an interesting document called the Rosetta, which shows how to perform the same actions in all of the currently available symbolic algebra systems.
LISPING AT JPL
A friend sends notice of Lisping at JPL by Erann Gat, a highly personal tale of the decline and fall of the use of Lisp at NASA's JPL. Here's the really grim part:
Now, you might expect that with a track record like that, with one technological success afte another, that NASA would be rushing to embrace Lisp. And you would, of course, be wrong.
The New Millennium missions were supposed to be the flagships for NASA's new "better, faster, and cheaper" philosophy, which meant that we were given a budget that was impossibly small, and a schedule that was impossibly tight. When the inevitable schedule and budget overruns began the project needed a scapegoat. The crucial turning point was a major review with about 200 people attending, including many of JPL's top managers. At one point the software integration engineer was giving his presentation and listing all of the things that were going wrong. Someone (I don't know who) interrupted him and asked if he could change only one thing to make things better what would it be. His answer was: get rid of Lisp.
Monday, November 25, 2002
One of the painful things about message passing libraries like MPI that keeps them from being used as much as they should be is the large number of low-level calls that tend to overwhelm one with details not overly related to the nature of the problem you're attempting to solve. The fine folks at the NOAA/FSL Aviation Division have created the Scalable Modeling System as a solution to this problem. Their take on the matter:
SMS provides directive-based parallelization of serial Fortran codes using an analysis and translation tool to transform SMS directives and serial code into a parallel version of the code. Twenty SMS directives are available to handle parallel operations including data decomposition, communication, local and global address translation, I/O, spectral transformations, and nesting. As the development of SMS has matured, the time and effort required to parallelize codes for MPPs has been reduced significantly. Code parallelization has become simpler because SMS provides support for advanced operations including incremental parallelization and parallel debugging.
The software is available as is a very well written user's guide and reference manual.
Jennifer Vesperman writes of third-party tools for use with CVS.
CVS (Concurrent Versioning System) is a popular version control system. It provides many features, and is useful in many situations. It does, however, have its faults. The standard client works from the command line, it doesn't automatically integrate with development environments, and there are useful features it lacks. Not to worry. It's an open source program, and there are a host of third-party utilities that provide features and integration. There are also many graphical clients.
There's something available for just about every platform combination.
The latest installments in Daniel Robbins' advanced filesytem implementor's guide are Introduction to EVMS and More About EVMS. So what is it?
If you suspect that I'm about to say that EVMS handily solves all of these problems in one fell swoop, then you're absolutely right. It does. EVMS provides a uniform, extensible, plug-in-based API for all storage technologies under Linux. What does that mean? It means that thanks to EVMS, you can use a single tool to partition disks, create LVM objects, and even create Linux software RAID volumes. And you can use the same tool to combine these technologies in powerful ways. EVMS can see the "big picture"; it can see exactly how everything is layered, from the filesystem all the way down to the physical disks holding the data. Not only that, but EVMS is compatible with all your existing Linux technologies. It doesn't force you to replace your partitions, LVM, or software RAID volumes. Instead, it will gladly work and interact with your existing storage configuration via its unified storage management interface. In fact, EVMS currently offers your choice of a command-line interface, an ncurses-based interface, and a fantastic storage management GUI written in GTK+.
You can find the goodies at the EVMS home page.
Recent Freshmeat entries of interest:
- CSS - Cameron Simpson's Scripts are a collection of over 1000 scripts (and supporting Perl modules) for performing a wide variety of sysadmin, text manipulation and other tasks
- Moodle, a package for producing Internet-based courses and web sites
- Jabberwocky, a LISP IDE with a LISP-aware editor with syntax highlighting, parentheses matching, a source analyzer, indentation, a source level debugger, a project explorer, and an interaction buffer
- Purenum, an arbitrary precision bignum library for C++
- Quantum GIS, a GIS offering support for vector and raster formats as well as spatially-enabled tables in PostgreSQL using PostGIS
BIOS: THE BEGINNING OF THE END
An open source milestone has been reached by the misslab folks at the University of Maryland. According to their announcement (via Slashdot):
We're happy to announce that we've successfully booted Windows 2000 without a legacy proprietary BIOS. We accomplished this by developing software that combined elements from two very successful projects: LinuxBIOS and BOCHS. The Etherboot project also helped in various ways.
As a result, we now have a completely free software replacement for the BIOS that supports (without modification) either LILO or GRUB as bootloaders, and Linux, OpenBSD, and Windows 2000 as operating systems (NOTE: We're still working on supporting FreeBSD and Windows XP. We expect that improving ATA support will permit Win98 and WinXP to boot, and finishing PIRQ support will permit FreeBSD to boot.)
Motherboard support is limited at this time, but we hope to expand that along with LinuxBIOS.
Friday, November 22, 2002
The Signal Processing Algorithms Implementation Research for Adaptable Libraries project goal is "to automate the implementation, optimization, and platform adaptation of signal processing algorithms on different computer architectures (uniprocessors, multiprocessors, hardware)." How do they do it?
"In short, for each transform there are many algorithms, for each algorithm there are many conceivable implementations, and we try to pick the best combination of both w.r.t. the given computing platform. Since the search space is too large to be considered exhaustively, we employ intelligent search techniques to find an optimal solution."
The capabilities and features include:
Similar projects include ATLAS and FFTW.
- easily generating platform-adapted C or Fortran implementations of a number of DSP transforms, such as the discrete Fourier (DFT) or cosine (DCT) transforms, with support for arbitrary dimensions;
- generating platform-adapted C or Fortran implementations for composed transforms, e.g., a DFT followed by scaling followed by a DC;
- easy extension to include new transforms;
- creating C or Fortran implementations for Fourier transforms of solvable groups; and
- analyzing a given matrix for symmetry and, in the positive case, factorizing it and converting the factorization into a C or Fortran program.
Thursday, November 21, 2002
Interesting 11/20/02 and 11/19/02 Freshmeat items include:
- breve - 3-D simulation environment designed for the simulation of decentralized systems and artificial life
- OpenMosix - a set of Linux kernel extensions for building a computer cluster
- Jmol - a molecule viewer and editor
- RKWard - an extensible GUI for R
- KRoC - a retargetable Occam compiler
- BioConductor - a set of R packages for bioinformatics data analysis
- RLPlot - a plotting program for creating high quality graphs from data
- RefDB - a reference database and bibliography tool for SGML, XML and LaTeX/BibTeX documents
THE ZEN OF COMPREHENSIVE ARCHIVE NETWORKS
Jarkko Hietaniemi, the CPAN Master Librarian, writes (via use Perl;) about applying the CPAN model to other languages/projects.
"It seems that there is a lot of interest in having similar archives for other languages like CPAN  is for Perl. I should know; over the years people from at least Python, Ruby, and Java communities have approached me or other core CPAN people to ask basically 'How did we do it?'. Very recently I've seen even more interest from some people in the Perl community wanting to actively reach out a helping hand to other communities. This 'missive' tries to describe my thinking and help people wanting to build their own CANs. Since I hope this message will somehow end up reaching the other language communities I will explicitly include URLs that are (hopefully) obvious to Perl people."
DAILY PYTHON-URL EXTRACTS
Recent items from Daily Python-URL that may be of interest include:
Skipping (painfully) the obvious puns for now, I'll get right to the author's description:
Lush is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large-scale numerical and graphic applications. Lush is designed to be used in situations where one would want to combine the flexibility of a high-level, loosely-typed interpreted language, with the efficiency of a strongly-typed, natively-compiled language, and with the easy integration of code written in C, C++, or other languages.
The documentation includes a 600 page reference manual and a brief tutorial.
Lush can be used advantageously for projects where one would otherwise use a combination of an interpreted language like Matlab, Python, Perl, S+, or even (gasp!) BASIC, and a compiled language like C. Lush brings the best of both worlds by wrapping three languages into one: (1) a weakly-typed, garbage-collected, dynamically scoped, interpreted language with a simple Lisp-like syntax, (2) a strongly-typed, lexically-scoped compiled language that uses the same Lisp-like syntax, and (3) the C language, which can be freely mixed with Lush code within a single program, even within a single function. It sounds complicated, but it is not. In fact, Lush is designed to be very simple to learn and easy to use.
If you do research and development in signal processing, image processing, machine learning, computer vision, bio-informatics, data mining, statistics, simulation, optimization, or artificial intelligence, and feel limited by Matlab and other existing tools, Lush is for you. If you want a simple environment to experiment with graphics, video, and sounds, Lush is for you.
Lush's main features include:
- a clean, simple, and easy to learn Lisp-like syntax;
- a compiler that produces very efficient C code and relies on the C compiler to produce efficient native code;
- an easy way to interface to C functions and libraries, and a powerful dynamic linker/loader for object files or libraries written in other compiled languages;
- the ability to freely mix Lisp and C in a single function;
- a powerful set of vector/matrix/tensor operations;
- a library of over 10,000 numerical routines, including full interfaces to GSL, LAPACK and BLAS;
- a library of image and signal processing routines;
- a extensive set of graphics routines, including an object-oriented GUI tookit, and interface to OpenGL, and an interface to OpenRM;
- an interface to the Simple Directmedia Layer (SDL) multimedia library;
- sound and video grabbing (via ALSA and Video4Linux);
- several libraries for machine learning, neural nets, statistical estimation, etc.;
- libraries for computer vision and 3-D scene rendering;
- JavaVM and Python C API bindings.
Tuesday, November 19, 2002
PubScience Shuts Down
An InfoAnarchy article tells about the demise of a public database.
After intense lobbying from the Software & Information Industry Association (SIIA), the United States Department of Energy has shut down the PubScience database. On the site there is now only a message: "PubScience has been discontinued". The old static content can still be viewed in the Web Archive (the database functionality is of course not mirrored there).
PubScience was an index of scientific journals used by DoE researchers to publish their findings. It was not a full text database. A PubScience search result would give you the abstract of an article and a link to the publisher's page, where, depending on the publisher, you could either view the article for free, or buy it. In other words, it was a private/public partnership: PubScience provided the summaries for free, and if you wanted to read the full text, you usually needed a subscription or had to pay per article.
Nevertheless, there are commercial indexing services competing with PubScience, which offer some data for free, but often charge substantial sums even for viewing the abstracts (and will likely raise their prices now that the free competition is out of business). The SIIA lobbying eventually resulted in a bill with an attached report that recommended the closure of PubScience because of its competition with commercial services. The whole thing cost only $500,000 a year but was the most popular website of the DoE.
Dan Kaminsky of DoxPara Research has released Paketto Keiretsu, described at Slashdot as "a collection of five interwoven 'proofs of concept' that explore, extract and expose previously untapped capacities embedded deep within networks and their stacks." Brief descriptions of the five components are provided (although he provides another set of descriptions in a Slashdot post entitled What Paketto Is (In Simpler Terms)):
A package containing implementations of each is available under the BSD License.
Scanrand is a proof of concept, investigating stateless manipulation of the TCP Finite State Machine. It implements extremely fast and efficient port, host, and network trace scanning, and does so with two completely separate and disconnected processes -- one that sends queries, the other that receives responses and reconstructs the original message from the returned content. Security is maintained, in the sense that false results are difficult to forge, by embedding a cryptographic signature in the outgoing requests which must be detected in any received response. HMAC-SHA1, truncated to 32 bits, is used for this "Inverse SYN Cookie".
Minewt is a minimal "testbed" implementation of a stateful address translation gateway, rendered so entirely in userspace that not even the hardware addresses of the gateway correspond to what the kernel is operating against. Minewt implements what is common referred to as NAT, as well as a Doxpara-developed technique known as MAT. MAT, or MAC Address Translation, allows several backend hosts to share the same IP address, by dropping the static ARP cache and merging Layer 2 information into the NAT state table. Minewt's ability to manipulate MAC addresses also allows it to demonstrate Guerilla Multicast, which allows multiple hosts on the same subnet to receive a unicasted TCP/UDP datastream from the outside world. Minewt is not a firewall, and should not be treated as such.
Linkcat(lc) attempts to do to Layer 2 (Ethernet) what Netcat(nc) does for Layer 4-7(TCP/UDP): Provide direct, bidirectional, streaming access to the network. Lib cap/tcpdump syntax filters may be specified in either direction, but no filtering is enabled by default. Two separate syntaxes are supported; one accepts and emits libpcap dump format(raw binary w/ a fixed size file header and a fixed size packet header), the other accepts and emits simple hex w/ backslash line continuation. Several other features are also implemented; specifically, early work involving the embedding of cryptographic shared- secret signatures in the Ethernet Trailer is demonstrated.
Phentropy plots an arbitrarily large data source (of arbitrary data) onto a three dimensional volumetric matrix, which may then be parsed by OpenQVIS. Data mapping is accomplished by interpreting the file as a one dimensional stream of integers and progressively mapping quads in phase space. This process is reasonably straightforward: Take four numbers. Make X equal to the second number minus the first number. Make Y equal to the third number minus the second number. Then make Z equal to the last number minus the third number. Given the XYZ coordinate, draw a point. It turns out that many, many non-random datasets will have extraordinarily apparent regions in 3-space with increased density, reflecting common rates of change of the apparently random dataset. These regions are referred to as Strange Attractors, and can be used to predict future values from an otherwise random system.
Paratrace traces the path between a client and a server, much like "traceroute", but with a major twist: Rather than iterate the TTLs of UDP, ICMP, or even TCP SYN packets, paratrace attaches itself to an existing, stateful- firewall-approved TCP flow, statelessly releasing as many TCP Keepalive messages as the software estimates the remote host is hop-distant. The resultant ICMP Time Exceeded replies are analyzed, with their original hopcount "tattooed" in the IPID field copied into the returned packets by so many helpful routers. Through this process, paratrace can trace a route without modulating a single byte of TCP/Layer 4, and thus delivers fully valid (if occasionally redundant) segments at Layer 4 -- segments generated by another process entirely.
Monday, November 18, 2002
LINUX SOFTWARE ENCYCLOPEDIA
I've been keeping the Linux Software Encyclopedia going for nearly seven years now, although updates have been very sporadic over the last two or three. You'll find a horrific number of dead links, but the installation of a DSL line at my Unindicted Co-Conspirator's place will allow me to get back to occasional updates in the near future.
PROGRAMMING TEXTS AND TUTORIALS
I've recently had occasion to update and expand several parts of my Programing Texts and Tutorials page. The dotcom meltdown has led to more than a few dead links, but most of the stuff I originally found 3 or 4 years ago is still out there, along with all sorts of spiffy new things. Follow the links in the "Recent Changes" section to find the updated sections. Enjoy.
QUANTUM PROGRAMMING LANGUAGE
The fine folks at Lambda the Ultimate, a weblog whose discovery provided a lot of the impetus for splitting my software stuff off into this weblog, have found a fascinating paper about the development of a language for quantum programming. It's 42 pages and available in DVI, PostScript and PDF.
LAPACK/BLAS AND CLUSTERING
In the course of investigating options for a cluster of Linux boxes hereabouts, I've cobbled together the beginnings of a Scientific Computing on Linux web page. It currently contains information about the varieties of BLAS and LINPACK available for Linux programs, as well as overviews of the available clustering packages.
Tuesday, November 12, 2002
MIT and Hewlett-Packard have collaborated on a software package called DSpace that provides a way to preserve and share the intellectual output of research.
DSpace is an open source software platform that enables institutions to:
Marvel at the features, read the DSpace System Documentation and download it from SourceForge.
- capture and describe digital works using a submission workflow module;
- distribute an institution's digital works over the web through a search and retrieval system; and
- preserve digital works over the long term.
The requirements for installing and using this are a UNIX-like OS (tested with HP/UX, Linux and OS X), Java, JavaBeans Activation Framework, JavaServer Pages, JavaMail, Tomcat, Apache, Apache Ant and PostgreSQL. This is a veritable leverage-fest.
THE mpC PARALLEL PROGRAMMING ENVIRONMENT
The same folks who brought you C have also developed the mpC parallel programming environment, with the former being a subset of the latter. The details:
mpC is a high-level parallel language (an extension of ANSI C), designed specially to develop portable adaptable applications for heterogeneous networks of computers. The main idea underlying mpC is that an mpC application explicitly defines an abstract network and distributes data, computations and communications over the network. The mpC programming system uses this information to map the abstract network to any real executing network in such a way that ensures efficient running of the application on this real network. This mapping is performed in run time and based on information about performances of processors and links of the real network, dynamically adapting the program to the executing network.
The documentation includes:
The mpC programming system includes a compiler, a run-time support system (RTSS), a library, and a command-line user interface. The compiler translates a source mpC program into the ANSI C program with calls to functions of RTSS. RTSS manages processes, constituting the parallel program, and provides communications. It encapsulates a particular communication platform (currently, a subset of MPI) ensuring platform-independence of the rest of system components.
Monday, November 11, 2002
THE C PROGRAMMING LANGUAGE
C provides Fortran90-like arrays for C via a pre-compiler. According to the authors at the Institute for System Programming at the Russian Academy of Sciences:
The C (pronounced "see brackets") programming language is a Fortran90-like C extension. While preserving all ANSI C syntax and semantics, new powerful facilities for array processing are introduced.
The C programming language is aimed at producing portable, tunable and efficient code for a variety of modern platforms. In particular, systems with multilevel memory hierarchy and instruction level parallelism are supported.
Support of array-based computations is provided. The language permits to manipulate arrays as single objects. The C syntax offers natural form to express array-based computations which also allows compiler to fully utilize the performance potential of a target platform.
The key C features are:
- Access to an array as a whole as well as access to both regular and irregular segments of an array
- Variable-size (dynamic) arrays
- Variety of elementwise and reduction operators
C is a subset of the mpC programming language. While C addresses instruction level parallelism and memory hierarchy of a single-chip platform, mpC is aimed at exploiting parallelism of distributed memory architectures. Thus, mpC provides a way for comprehensive utilization of the performance potential of a target platform (for example, a network of UNIX workstations) at all levels.
A cbc compiler translates source code files of the form *.cb into C files, which are then compiled by your native C compiler via, e.g.
gcc -I/usr/local/CBC/h test.c
where /usr/local/CBC/h is the subdirectory of the C root directory containing the needed include files.
The documentation includes: