Notes and observations on software for (mostly) scientific applications.
Thursday, November 21, 2002
Interesting 11/20/02 and 11/19/02 Freshmeat items include:
- breve - 3-D simulation environment designed for the simulation of decentralized systems and artificial life
- OpenMosix - a set of Linux kernel extensions for building a computer cluster
- Jmol - a molecule viewer and editor
- RKWard - an extensible GUI for R
- KRoC - a retargetable Occam compiler
- BioConductor - a set of R packages for bioinformatics data analysis
- RLPlot - a plotting program for creating high quality graphs from data
- RefDB - a reference database and bibliography tool for SGML, XML and LaTeX/BibTeX documents
THE ZEN OF COMPREHENSIVE ARCHIVE NETWORKS
Jarkko Hietaniemi, the CPAN Master Librarian, writes (via use Perl;) about applying the CPAN model to other languages/projects.
"It seems that there is a lot of interest in having similar archives for other languages like CPAN  is for Perl. I should know; over the years people from at least Python, Ruby, and Java communities have approached me or other core CPAN people to ask basically 'How did we do it?'. Very recently I've seen even more interest from some people in the Perl community wanting to actively reach out a helping hand to other communities. This 'missive' tries to describe my thinking and help people wanting to build their own CANs. Since I hope this message will somehow end up reaching the other language communities I will explicitly include URLs that are (hopefully) obvious to Perl people."
DAILY PYTHON-URL EXTRACTS
Recent items from Daily Python-URL that may be of interest include:
Skipping (painfully) the obvious puns for now, I'll get right to the author's description:
Lush is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large-scale numerical and graphic applications. Lush is designed to be used in situations where one would want to combine the flexibility of a high-level, loosely-typed interpreted language, with the efficiency of a strongly-typed, natively-compiled language, and with the easy integration of code written in C, C++, or other languages.
Lush can be used advantageously for projects where one would otherwise use a combination of an interpreted language like Matlab, Python, Perl, S+, or even (gasp!) BASIC, and a compiled language like C. Lush brings the best of both worlds by wrapping three languages into one: (1) a weakly-typed, garbage-collected, dynamically scoped, interpreted language with a simple Lisp-like syntax, (2) a strongly-typed, lexically-scoped compiled language that uses the same Lisp-like syntax, and (3) the C language, which can be freely mixed with Lush code within a single program, even within a single function. It sounds complicated, but it is not. In fact, Lush is designed to be very simple to learn and easy to use.
If you do research and development in signal processing, image processing, machine learning, computer vision, bio-informatics, data mining, statistics, simulation, optimization, or artificial intelligence, and feel limited by Matlab and other existing tools, Lush is for you. If you want a simple environment to experiment with graphics, video, and sounds, Lush is for you.
Lush's main features include:
- a clean, simple, and easy to learn Lisp-like syntax;
- a compiler that produces very efficient C code and relies on the C compiler to produce efficient native code;
- an easy way to interface to C functions and libraries, and a powerful dynamic linker/loader for object files or libraries written in other compiled languages;
- the ability to freely mix Lisp and C in a single function;
- a powerful set of vector/matrix/tensor operations;
- a library of over 10,000 numerical routines, including full interfaces to GSL, LAPACK and BLAS;
- a library of image and signal processing routines;
- a extensive set of graphics routines, including an object-oriented GUI tookit, and interface to OpenGL, and an interface to OpenRM;
- an interface to the Simple Directmedia Layer (SDL) multimedia library;
- sound and video grabbing (via ALSA and Video4Linux);
- several libraries for machine learning, neural nets, statistical estimation, etc.;
- libraries for computer vision and 3-D scene rendering;
- JavaVM and Python C API bindings.
The documentation includes a 600 page reference manual and a brief tutorial.
Tuesday, November 19, 2002
PubScience Shuts Down
An InfoAnarchy article tells about the demise of a public database.
After intense lobbying from the Software & Information Industry Association (SIIA), the United States Department of Energy has shut down the PubScience database. On the site there is now only a message: "PubScience has been discontinued". The old static content can still be viewed in the Web Archive (the database functionality is of course not mirrored there).
PubScience was an index of scientific journals used by DoE researchers to publish their findings. It was not a full text database. A PubScience search result would give you the abstract of an article and a link to the publisher's page, where, depending on the publisher, you could either view the article for free, or buy it. In other words, it was a private/public partnership: PubScience provided the summaries for free, and if you wanted to read the full text, you usually needed a subscription or had to pay per article.
Nevertheless, there are commercial indexing services competing with PubScience, which offer some data for free, but often charge substantial sums even for viewing the abstracts (and will likely raise their prices now that the free competition is out of business). The SIIA lobbying eventually resulted in a bill with an attached report that recommended the closure of PubScience because of its competition with commercial services. The whole thing cost only $500,000 a year but was the most popular website of the DoE.
Dan Kaminsky of DoxPara Research has released Paketto Keiretsu, described at Slashdot as "a collection of five interwoven 'proofs of concept' that explore, extract and expose previously untapped capacities embedded deep within networks and their stacks." Brief descriptions of the five components are provided (although he provides another set of descriptions in a Slashdot post entitled What Paketto Is (In Simpler Terms)):
Scanrand is a proof of concept, investigating stateless manipulation of the TCP Finite State Machine. It implements extremely fast and efficient port, host, and network trace scanning, and does so with two completely separate and disconnected processes -- one that sends queries, the other that receives responses and reconstructs the original message from the returned content. Security is maintained, in the sense that false results are difficult to forge, by embedding a cryptographic signature in the outgoing requests which must be detected in any received response. HMAC-SHA1, truncated to 32 bits, is used for this "Inverse SYN Cookie".
Minewt is a minimal "testbed" implementation of a stateful address translation gateway, rendered so entirely in userspace that not even the hardware addresses of the gateway correspond to what the kernel is operating against. Minewt implements what is common referred to as NAT, as well as a Doxpara-developed technique known as MAT. MAT, or MAC Address Translation, allows several backend hosts to share the same IP address, by dropping the static ARP cache and merging Layer 2 information into the NAT state table. Minewt's ability to manipulate MAC addresses also allows it to demonstrate Guerilla Multicast, which allows multiple hosts on the same subnet to receive a unicasted TCP/UDP datastream from the outside world. Minewt is not a firewall, and should not be treated as such.
Linkcat(lc) attempts to do to Layer 2 (Ethernet) what Netcat(nc) does for Layer 4-7(TCP/UDP): Provide direct, bidirectional, streaming access to the network. Lib cap/tcpdump syntax filters may be specified in either direction, but no filtering is enabled by default. Two separate syntaxes are supported; one accepts and emits libpcap dump format(raw binary w/ a fixed size file header and a fixed size packet header), the other accepts and emits simple hex w/ backslash line continuation. Several other features are also implemented; specifically, early work involving the embedding of cryptographic shared- secret signatures in the Ethernet Trailer is demonstrated.
Phentropy plots an arbitrarily large data source (of arbitrary data) onto a three dimensional volumetric matrix, which may then be parsed by OpenQVIS. Data mapping is accomplished by interpreting the file as a one dimensional stream of integers and progressively mapping quads in phase space. This process is reasonably straightforward: Take four numbers. Make X equal to the second number minus the first number. Make Y equal to the third number minus the second number. Then make Z equal to the last number minus the third number. Given the XYZ coordinate, draw a point. It turns out that many, many non-random datasets will have extraordinarily apparent regions in 3-space with increased density, reflecting common rates of change of the apparently random dataset. These regions are referred to as Strange Attractors, and can be used to predict future values from an otherwise random system.
Paratrace traces the path between a client and a server, much like "traceroute", but with a major twist: Rather than iterate the TTLs of UDP, ICMP, or even TCP SYN packets, paratrace attaches itself to an existing, stateful- firewall-approved TCP flow, statelessly releasing as many TCP Keepalive messages as the software estimates the remote host is hop-distant. The resultant ICMP Time Exceeded replies are analyzed, with their original hopcount "tattooed" in the IPID field copied into the returned packets by so many helpful routers. Through this process, paratrace can trace a route without modulating a single byte of TCP/Layer 4, and thus delivers fully valid (if occasionally redundant) segments at Layer 4 -- segments generated by another process entirely.
A package containing implementations of each is available under the BSD License.
Monday, November 18, 2002
LINUX SOFTWARE ENCYCLOPEDIA
I've been keeping the Linux Software Encyclopedia going for nearly seven years now, although updates have been very sporadic over the last two or three. You'll find a horrific number of dead links, but the installation of a DSL line at my Unindicted Co-Conspirator's place will allow me to get back to occasional updates in the near future.
PROGRAMMING TEXTS AND TUTORIALS
I've recently had occasion to update and expand several parts of my Programing Texts and Tutorials page. The dotcom meltdown has led to more than a few dead links, but most of the stuff I originally found 3 or 4 years ago is still out there, along with all sorts of spiffy new things. Follow the links in the "Recent Changes" section to find the updated sections. Enjoy.
QUANTUM PROGRAMMING LANGUAGE
The fine folks at Lambda the Ultimate, a weblog whose discovery provided a lot of the impetus for splitting my software stuff off into this weblog, have found a fascinating paper about the development of a language for quantum programming. It's 42 pages and available in DVI, PostScript and PDF.
LAPACK/BLAS AND CLUSTERING
In the course of investigating options for a cluster of Linux boxes hereabouts, I've cobbled together the beginnings of a Scientific Computing on Linux web page. It currently contains information about the varieties of BLAS and LINPACK available for Linux programs, as well as overviews of the available clustering packages.