By Joe Bartling, Bartling Forensic LLC
My recent article on The Enron Data Set – Where Did It Come From? got me to thinking that I should share some of the other experiences I’ve had supporting litigating attorneys and their E-Discovery challenges over the years. Here’s one from the early days.
During 1992, as a consultant specializing in Oracle database technologies and applications, I was hired as a contractor by Aspen Systems Corporation, now a part of Lockheed Martin, in developing their new automated litigation technology platform called AspenView™. AspenView was announced in the April 1991 issue of the ABA Journal as follows:
“Aspen Systems Corporation introduces AspenView™, a powerful new optical imaging capability for litigators. AspenView creates fully searchable document databases that hold indexed and full-text files. The files can be viewed on-screen and printed to a laser printer without leaving your desk. This image system consists of imaging hardware and software, scanning services, full-text OCR scanning and editing services, and a Litigation Workstation. AspenView is fully compatible with standard hardware.”
AspenView operated on a local area network of Windows 3.1 PCs, and was programmed in Visual Basic, and connected to a SCO Unix network server running the Oracle database. The application used the BRS/Search engine for text search and retrieval. The images were stored on network-connected optical ROM jukeboxes and printed using large HP network laser printers.
As an Oracle database expert, and as they were developing AspenView, I helped Aspen with developing search and indexing strategies to optimize the application’s display and queueing of image files responsive to the search queries, particularly when the application was being used by dozens of users. With some cases containing more than a million pages of images, we were definitely operating on the “bleeding edge” of litigation technology!
Of course, it wasn’t called “E-Discovery” back then. Litigation teams had “war rooms/document centers” which housed hundreds of shelves along the walls containing sometimes thousands of boxes of original case documents. Clever “Bates” numbering systems were developed to help paralegals keep track of the “context” of documents. For example, by looking at a Bates number such as FLY-JD-DR01-F020-0015 might be the document for case “FLY”, custodian “John Doe”, Folder 20, page 15. These unique numbers helped the paralegal team find and organize their documents in stacks of redwell (actually Redweld) folders for the attorneys to review.
Aspen’s solution was quite ingenious. Prior to this time, attorneys or paralegals had to scour through boxes, each containing thousands of pages to review the case material, pull out what was relevant to their matter, put a placeholder in the box, copy the document, and put the original back in the original folder in its proper box on the proper shelf.
With Aspen’s solution, all of the documents had been imaged in TIFF format and OCRed. The images were written to optical ROMs that were accessed with a jukebox. The file locations on the jukeboxes were stored in the Oracle database by filename and Bates number. The full OCR text of the scanned images was stored and indexed in the BRS database. Using AspenView, the user just searched for the documents using BRS search terms and simply queued them up for printing. Easy enough in principle, but hard in practice. It didn’t take long to have thousands of pages in the document printer queue, and things got jammed up. The printer was fast enough but sometimes the jukebox would take 90 seconds to find and retrieve a single image file. And then it would have to chug-a-lug back and forth for the next image file on another ROM. Multiply that times a thousand pages and you’ve got big problems.
On an important case, and with all of this brand new technology, circa 1992-1993, you needed someone on-site to make sure everything was working properly and to fix things when they broke, which happened several times a day!
So Aspen hired me as a contractor in 1993 to work on-site for six months in San Jose California to manage the installation and operation of the AspenView system supporting the legal team on IBM Corp vs Seagate Technology. Aspen put me up in a furnished apartment in San Jose and I commuted daily to the IBM campus facility that hosted our litigation support “war room” and the work of the legal team.
At first, the temptation for the users was just to print out everything their search terms brought back. That turned out to be a recipe for disaster that needed to be avoided. So, I showed them that the key to having the system run at optimal performance was to develop tight search criteria, bring back a reasonable number of pages to print, and expand from there. OCR technology was horrendous in those days, and you had to search with stemming and “fuzzy” search to find much of what you were looking for. But broad, “fuzzy” searching brought back LOTS of documents, many of which were false positives. Working one-on-one with the team members, we were able to fine-tune an iterative strategy of searching and re-searching once more information about the documents was known.
Back in 1992-1993, NONE of the original materials we had in the case were in electronic form. Everything was paper. Custodian interviews captured information from questions such as “which desk drawer or which file cabinet did you keep your papers on project XYZ?”. Some of that document collection information ended up being “coded” in either the “Bates” numbering system or in spreadsheets, sometimes manually prepared by the paralegals. These fields of captured information were the original “metadata”, and provided the document “context” for subsequent document search and retrieval. In many cases, documents were sent to be “coded”, where information about each document was kept in these spreadsheets, such as:
- the date of the document
- the title of the document
- the author of the document
- the addressee
- the type of document
- the names mentioned in the document
- whether the document was handwritten or typed
This information was *really* important because, for example, many documents were handwritten, and would NEVER be retrieved by an automated search. The images of the handwritten documents would have to be retrieved using these other fields.
Between the coded information, the stored full-text of the OCRed documents, and the ability to search and retrieve specific documents “in context”, (even if they had to be printed) AspenView revolutionized the litigation technology industry, laying the groundwork for innovation in what became E-Discovery as we know it today and what it will be in the future.
I was fortunate to be able to play a “hands-on” role in such an important time in the industry and its history.
Copyright 2015 Bartling Forensic LLC