One this page you will find some "real world" multi dimensional data sets. For right now there are two Tiger data sets, extracted from the US Bureau of Census TIGER database by some unknown person (if you know the person please send me email so I can reference appropriately), and a few CFD data sets. This work was partially supported by NSF grant number 9610270.
Only the small data sets are given in ascii format, the rest in binary. Included is a simple (and not very elegant) c program to convert from the binary format to an ascii format. There is just enough documentation at the top to show how to use it. Here it is: ( b2a.c )
If you use any of these CFD data sets, please reference this web page and the creator of the data:
D.J. Mavriplis, "An Advancing Front Delaunay Triangulation Algorithm Designed for Robustness," Journal of Computational Physics, vol.\ 117, p.\ 90-101, 1995.
These are vertex data sets from various Computation Fluid Dynamics models. The data sets are for a 2-dimensional problem. A system of equations is used to model the air flows over and around aero-space vehicles \cite{Mavr95}. The data sets are for a cross section of a Boeing 737 wing with flaps out in landing configuration at MACH 0.2. The data space consists of a collection of points (nodes) of varying density. Nodes are dense in areas of great change in the solution of the equations and sparse in areas of little change. The location of the points in the data set is HIGHLY skewed.
The format of the data sets is (xmin ymin xmax ymax) which delimit the lower left hand corner and upper right hand corner of the rectangle. Size these data sets are point data, xmin = xmax and ymin = ymax. The points range from (-20,-20) to (20,20), but I have normalized the data sets to the unit square to facilitate experimental studies that include other data sets. For the 5088 Node Set the original ascii format (not normalized), ascii normalized version, and binary normalized version are all included. For the larger data sets only the normalized binary version is included. If for some reason you need the non-normaized versions contact me. The binary is normal IEEE binary, four floats per point.
The three data sets differ mostly in the number of points. To get a good idea what the data looks like download and preview the postscript picture below for the 5088 Vertex Data Set. The complete normalized shows the whole data set, the blow up is just the region around the centroid.
For a more complete description check out section 3 of the paper "STR: A Simple and Efficient Algorithm for R-Tree Packing on the page Leut Pubs . Note, if you have a copy previous to 9/3/96 it has been changed, the old version does NOT explain this data set.
These data sets are a Delaunay triangulation (with a few flips) of the point data sets above. See the postcript picture included below. These data sets are originally in a space effecient format. Consider first the 9759 Triangle Set below. The original ascii file shows the format. Each line contains the triangle number followd by the point numbers of the the three corners of the triangle. To get the coordinates of these triangle vertices, just look up the point in the 5088 origianal ascii point data set (above). Each triangle of the data set is then bounded by a rectangle as shown in the Non-normalized rectangle-bounded set below. Next, the set of recatangles is normalized to the unit square. Finally the normalized rectangle set is converted to binary. For the larger data sets only the normalized binary data sets are included.
These are line segment data contains the road maps of LongBeach and Montgomery Counties. If you use these, please reference as: "Extracted from the US Bureau of Census TIGER database".