Notes and observations on software for (mostly) scientific applications.
Tuesday, March 30, 2004
Algorithm Development and Mining system being developed at the University
of Alabama in Huntsville looks well-designed and useful.
ADaM...is used to apply data mining technologies to remotely-sensed and other scientific data. The mining and image processing toolkits consist of interoperable components that can be linked together in a variety of ways for application to diverse problem domains. ADaM has over 75 components that can be configured to create customized mining processes. Preprocessing and analysis utilities aid users in applying data mining to their specific problems. New components can easily be added to adapt the system to different science problems.
The components are grouped in several
ADaM 4.0 components are general purpose mining and image processing modules that can be easily reused for multiple solutions and disciplines. These components are well positioned to address the needs for distributed mining and image processing services in web and grid applications.
More on the component architecture:
- classification techniques
- clustering techniques
- pattern recognition utilities
- association rules
- optimization techniques
- basic image processing operations
- segementation/edge and shape detection
- texture features
ADaM's component architecture is designed to take advantage of emerging computational environments such as the Web and information Grids. Individual ADaM operations can execute in a stand-alone fashion, facilitating their use in distributed and parallel processing systems. The operations - organized as toolkits - provide pattern recognition, image processing, optimization, and association rule mining capabilities. Components are packaged in several ways, including C/C++ libraries, executables, and Python modules. Multi-interface component packaging facilitates rapid prototyping and efficient, performance-critical data mining application development. This approach also facilitates the use of ADaM components by and with third-party analysis and visualization systems.
Monday, March 29, 2004
While looking for some way - any way - to increase the performance of our
computational cluster, I found the
DataSpace is a web services based infrastructure for exploring, analyzing, and mining remote and distributed data. This site describes DataSpace protocols, DataSpace applications, and open source DataSpace servers and clients.
The most interesting bit was the
SABUL protocol, although it is currently being superseded by
DataSpace applications employ a protocol for working with remote and distributed data called the DataSpace Transfer Protocol or DSTP. DSTP simplifies working with data by providing direct support for common operations, such as working with attributes, keys and metadata.
The DSTP protocol can be layered over specialized high performance transport protocols such as SABUL. Using protocols such as SABUL, DataSpace applications can effectively work on wide area high performance OC-3, OC-12 and Gbps networks. SABUL currently holds the landspeed record for connecting two distributed clusters, a record set at iGrid 02.
Starting in 2003, we began developing a new version of SABUL called UDP-based Data Transport Protocol or UDT, which uses UDP for both the control and data channel. An open challenge is to design protocols for high performance data transport so that they are friendly to both other flows using the same protocol (intra-protocol fairness) and to other flows employing different protocols, such as TCP (TCP friendliness). In both simulation and experimental studies using UDT, we have found UDT to be fair to both dozens of other UDT flows as well as friendly to hundreds of concurrent TCP flows.
Now all I have to do is figure out how frigging hard it would be to implement the UDT protocol in place
of TCP/IP on the cluster, and I've not yet begun to spin up sufficiently to appreciate the complexities
of the task.
HYPERSPECTRAL CHALLENGE PROBLEM
Reconfigurable Computing Systems folks at LANL are tackling
Hyperspectral Challenge Problem.
In the area of remote sensing, civilian, industrial, and military applications are becoming increasingly overwhelmed by the volume and complexity of imaging data now being collected. Airborne remote sensing was originally limited to black and white imagery, but now multi and hyperspectral image sensors are delivering datasets with dozens, hundereds, or thousands of spectral channels per image spatial pixel. The processing of these datasets in real time has become a very difficult problem. We provide here an overview (large!) and example data (very large) and C code for a variety of platforms. The data set provided here must be processed within 3 seconds to meet real time requirments for representative systems. Typical LINUX workstations have been clocked with this problem at a few minutes.
ISIS is a most interesting package for searching for patterns in images and signals,
the name of which brings back fond memories of lusting over the woman who
played the title character in a Saturday morning TV series with the same
name in the early 1970s.
Their take on the matter:
The Los Alamos ISIS program is developing a set of software packages and reconfigurable computing hardware to enable rapid exploration and analysis of images and signals. Packages in the ISIS software suite build customized, robust, automated algorithms for feature extraction and analysis. With current sensor platforms collecting a flood of high-quality data, automatic feature extraction (AFE) has become a key to enabling human analysts to keep up with the flow. The ISIS software packages produce AFE tools for features in multispectral, hyperspectral, panchromatic, and multi-instrument fused imagery. Both spectral and spatial signatures of image features are discovered and exploited. The software features an interactive graphical user interface, and a parallel/scalable processing backend that runs on off-the-shelf computers.
The Los Alamos ISIS software suite currently includes four "tool-maker" image/signal processing packages, as well as a common point-and-click graphical user interface (called Aladdin) for providing training data and running the tool-makers. The four tool-maker packages are:
- GENIE, which uses techniques from genetic programming to build customized spatio-spectral algorithms for a wide range of sensors (electro-optical, infrared, and other modalities; panchromatic through hyperspectral data). GENIE is designed to process imagery, and has also been applied to image-like signals (e.g., "waterfall" displays).
- POOKA, which combines reconfigurable computing hardware with evolutionary algorithms to allow rapid prototyping of image and signal processing algorithms implemented at the chip level. This enables POOKA to rapidly produce customized automated feature extraction algorithms the run hundreds of times faster than equivalent algorithms implemented in software. POOKA uses a commercially available reconfigurable computing board that plugs into standard Windows workstations.
- Afreet, which exploits recent advances in computational machine learning theory, combining adaptive spatio-spectral image processing with a powerful support vector machine (SVM) supervised classifier to process imagery and image-like signals.specializes in signal processing, using evolutionary computational techniques to build signal classification algorithms.
- Zeus, which specializes in signal processing, using evolutionary computational techniques to build signal classification algorithms. Zeus is the newest member of the ISIS suite of toolmakers.