fmII
Wed, Aug 20th home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop 17:21 UTC
in
Section
login «
register «
recover password «
[Article] add comment [Article]

 Setting Data Free
 by jeff covey, in Editorials - Mon, Nov 20th 2000 23:59 UTC

I see two trends in progress. In one, we're continuing movement towards application-independent data storage. In the other, we're witnessing a proliferation of devices that each store the same data in a unique and incompatible way. I believe it's a time to watch developments carefully, and to be ready to move our advocacy efforts to a new arena.


Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly.

A brief tour of my history with computers

In 1983, Texas Instruments acknowledged that their home computer business was beyond hope. The TI 99/4A, originally sold for $1,150, was discontinued, and stores (needing to dump their stockpiles during the Christmas shopping season) put the machines on sale for $50-$100 each. Lines formed in the parking lots in scenes presaging the crowds of hopefuls waiting all night for Playstations (or products of more questionable quality, such as Windows95 and Star Wars I). For some time leading up to this, I had been left in the electronics section of the department store whenever the family went shopping (I had pong at home, but it was still unimaginable magic to hit keys on a keyboard and see letters appear on the screen), so there was no question what to get me for Christmas. My 99/4A was connected to a TV on Christmas morning, and everyone knew where to find me for the next few years.

Eventually, we even got a cable that connected the computer to a cassette recorder, and it was possible to write programs to a cassette so you didn't have to type them in again every time you turned the machine on! If the volume was set just right and the wind was blowing south-southwest, the computer just might understand the contents of the tape and load your program every fifth time.

This was the system I had through the first couple of years of high school. I spent a geek-appropriate amount of time playing with "TI Extended BASIC" -- a lot of graphics and sound programming, a text adventure in which you explored the sunken city of Atlantis, a program that did virtual dice rolls to create random Dungeons & Dragons characters, one that plotted the Cartesian graphs we were studying in Math, etc.

At the start of my Junior year, I transfered to a school that had a lab in the basement filled with Apple ][s. For the first time, I started to use a computer medium (the 5.25" floppy disk) to store information that mattered to me -- my school papers, letters, etc. When I went to college, I took my newly-purchased Apple ][e clone, and that was the machine that handled all my writing chores, balanced my checkbook, etc. for several years.

My geek power points rating was extremely low during this period; I just used the computer as a tool, and was no more involved with it than was the average secretary. It wasn't until I started living with someone who had a PC that I began to be interested in computers for their own sake again. We decided that the 286 with GeoWorks was not going to be sufficient for surfing the Web, so I bought some books and learned to build a 486. After two years of struggling with Windows95, I put my first Linux CD in the drive, and everything was better.

Mostly.

What it takes to retrieve old data

I now own information scattered across a variety of media -- audio cassettes, 5.25" floppies, 3.5" floppies, hard drives, and zip disks -- all at different levels of accessibility.

Information from the TI era
Assuming I could find the cassettes, I would have to dig out the 99/4A and hope I could get the cassette interface to work. Then what? I'm sure there's a FAQ somewhere that explains how to set up a serial connection to a PC, so I might get the code transfered and run it in an emulator, but it would be a hassle.
Information from the Apple era
A bit easier here; I know the Apple has a serial port (I'm not so sure about the TI), emulation software is handy, and I could probably at least yank the text out of my AppleWorks files.
Information from the GeoWorks era
No problem there; it ran on top of DOS, and saved to a DOS filesystem.
Information from the Windows era
No problem at all; when I switched to Linux, I just saved my word processor documents as text and copied them to my Linux partition before reformatting the Windows one as ext2.

The good news is that how hard it is to retrieve the data varies inversely to how important it is to me. I don't care much about what's on my TI cassettes (though I would be curious to see how I wrote the engine for my adventure game). The Apple disks are filled with school papers, short stories, and really bad poetry that I have in hard copy and should burn someday anyway.

Why I'm boring you with this

The good news is deceptive; I believe things have been getting progressively better, but we've turned a corner, and now they're getting worse, from both software and hardware perspectives.

As the good news says, it's become easier for personal/home computers to share information. My first computer, that TI 99/4A, had no way to share data with a Commodore 64, a Timex Sinclair, an Atari ST, a Tandy TRS-80, or any of the other computers around at the time. Say what you will about the near monopoly of IBM PC-compatible hardware, but it gave us a de facto standard that made it easier for our machines to share information. Today, the worst problem I can imagine is that someone would hand me a Mac floppy, and Internet use is now widespread enough that I could get away with asking her to email the files to me instead.

The new problems are:

  1. Hardware compatibility doesn't solve problems of software incompatibility.
  2. Our hardware is becoming incompatible once again, in ways that could make the time of the myriad of unique home computers seem like the good old days.

The software problem

Let's look at how I handle these important pieces of information:

  • email messages
  • addresses and phone numbers
  • notes
  • my schedule

I only started keeping this information on a computer when I got online, so I only have to trace back what I've done to the time of using a terminal program under GeoWorks:

email
My first use of email was accomplished by dialing into my university's Unix systems and using PINE. From there, I moved to Pegasus mail under Windows, to VM in Emacs under both Windows and Linux, and finally to mutt.
addresses and phone numbers
For a long time, I still kept a physical address book. Once I started using Emacs for mail and news reading, I discovered the Insidious Big Brother Database, which has all kinds of nifty features, such as the ability to pop up a window with the information about a person when you open a message from her.
notes
I've used notes-mode in Emacs to keep track of all the things I never get around to doing. Notes mode does automatic linking and indexing of note topics, so you can skip from one note about a topic forward or back to your other notes about it, get an index of all the notes sorted by topic, etc. Unfortunately (or maybe fortunately...), the indexing script has recently started puréeing my notes; I've opened a number of them and found that all but the first few words are missing. I've switched to using note. I'm not as happy with it, but I was even less happy with watching my notes disappear.
my schedule
I somehow gained possession of a Lotus calendar application once (a free gift for buying x amount of hardware from parts-r-us, I think), and used it to keep track of things for a year or so. Once I started living in Emacs, I used its calendar functions. Now I use Yahoo!'s calendar, so I can access it from anywhere, have it email reminders to me, etc.

The heart of the software problem is this question: How hard was it to move data from each application to the next?

PINE stored messages in mbox format. Pegasus used binary folder files. IIRC, I didn't have many saved messages at the time, and I just forwarded them all to myself. Going from Pegasus back to mbox for VM and mutt required something mildly unpleasant like getting Pegasus to write all the messages to separate files and then coercing them into one. I don't remember exactly what I had to do, but it wasn't too bad.

My first collection of email addresses were kept in my PINE address book. I downloaded an application from somewhere that converted PINE address books to Pegasus ones. When I moved to BBDB, I believe I entered everything again by hand while I was adding street addresses and phone numbers.

Transferring notes took some time, but went smoothly. notes-mode keeps a separate note file for each day, stored like ~/NOTES/199909/9909. note keeps everything in ~/.notedb (by default; it can also use MySQL, etc.). Luckily, it can read notes by STDIN, so, after sending Perl in to change the syntax of the topics in my notes to match that used by note, I could just use find to locate and cat each file and feed it to note. (Oh, how I love Unix.) A little fine tuning, and I was done.

My schedule was recreated from scratch each time. Adding all the birthdays again was no fun, but I survived.

These experiences indicate that the software problem is not very great, at least for Unix users. People who have to deal with word processor files on Windows are in bad shape, but the rest of us can usually just look at a pair of file formats and fire up Emacs, vi, or Perl to make the necessary changes.

In spite of that, when I switch applications, experience has taught me to think carefully about the long view. My calendar, for instance, is not exactly locked in to Yahoo!, but transferring it somewhere would not be as trivial as it should be. Yahoo! gives me two options for creating backup files of my calendar. One is the Palm format, Date Book Archive, which stores the info in a binary file. If I looked, I would probably find tools for handling these files, but it still doesn't feel as secure to me as having the data in a text file. The other format is "Outlook format" (sic), or Comma Separated Values, which is quite ridiculous. (For example, instead of saying, "This birthday occurs on this date every year", it creates 37 copies of the birthday record, one for each of the next 37 years. How is the application that imports that supposed to know what was intended?) The best I can hope for is that DBA turns out to be a reasonable representation of my data, or another export format becomes available.

Take BBDB as another example. Now that I'm no longer using Emacs for everything (I've switched to using dedicated programs that can call an external editor, and call XEmacs with gnuclient), BBDB's VM and GNUS features don't matter to me, and there's no reason I couldn't move to using another address book, perhaps one that has good mutt support. I have to think about this two steps ahead; not only is there the problem of converting my ~/.bbdb to the format used by whichever application I pick, there's the problem of considering what I might have to do to move from that application to the next one I decide to use.

There's the crux of the software problem -- all the way down the line, my data doesn't change, but the ways in which my data is stored do. I still want to track the same information, so why shouldn't it be stored the same way whether I'm using an AppleWorks mail merge function, a Windows GUI address book, an Emacs lisp program interacting with mail and news readers, or a Web interface? If we could hop back in time and declare, "This is how address information will be stored. This is how a schedule will be stored. This is how notes and their cross-referencing information will be stored.", I could have used the same files for the last 15 years. My Apple, Windows, Linux, and Web applications would have all read and written the same files. There would have been no need to convert from one format to another. I could have switched back-and-forth between applications at will without a worry.

At this point, the XML alarms may be ringing in your head, and you may be eager to point out that, although it's coming to the game late, we're about to move into that happy situation. Well, maybe, assuming everyone can agree on DTDs and actually uses them properly instead of adding proprietary extensions at every turn. (Look at what happened to HTML; "This page best viewed with browser x" could evolve to "This calendar best viewed with calendar application y".) I certainly hope it works out. The problem lies in the unspoken assumption that you can upgrade your software to take advantage of the new, and hopefully final, format, and this takes us to my real worry:

The hardware problem

A few months ago, I joined the cellphone age. One of the initiation rituals consisted of an entire evening spent punching numbers from my address book into the phone. This is where the unspoken assumption fails. I can't upgrade my phone to new software capable of using the format used by the address book on my computer. Even if I could, I don't have a way of making my computer talk to the phone to pass my collection of numbers to it.

When a number changes, I have to change it in BBDB and on the phone. I have over 80 records in the phone, many with multiple numbers attached to them. If I drop it in the Baltimore harbor tomorrow, those numbers -- and the time spent entering them -- are gone. When I buy a replacement phone, I'll have to enter it all again. There are no backups, because there's no way to create a backup.

This is only going to expand as we move from using desktop computers to using more and more dedicated information appliances. It's a bad situation turning worse at the moment; we're going back to the days of incompatibility, but now with a wide variety of devices instead of just with computers. My TI couldn't talk to your Tandy; neither can my cellphone talk to yours.

This is what I meant by my ominous "Mostly." earlier. I can mostly be happy with the present situation. When I need to convert data on my computer, I have the tools at hand to do it. The problem is that that doesn't help me when I buy a phone and I don't have a shell on it. Even though all the data is sitting on my box waiting to be transfered, I'm stuck using the phone's only interface, the numeric keypad, punching "7" four times to get an "s".

Again, XML is put forward as the way out of this mess, and it holds great promise. When I mark a note as urgent on my laptop, XML should make it possible that my desktop machine and my Pilot will note the change and do whatever I've told them to do about it -- mark it off with a different color, beep me to remind me about it, or whatever. When I change Joe's phone number on my cellphone, it should be changed on the speed dial of all the phones in my house, in my address book, and in Joe-related events on my calendar.

It sounds wonderful, but I'm going to permit myself a dose of skepticism because the implementation requires the cooperation of a large number of people who prefer competition to cooperation. First, they have to reconfigure their devices to take advantage of XML. Then they have to agree on DTDs for the data they're using. Then they have to stick to the agreed-upon format and find other ways to distinguish their products now that they can no longer lock their customer base into their proprietary way of storing data.

I'm not saying it's impossible; the Internet has proven that it is. Proprietary protocols have been forgotten in the face of TCP/IP, HTTP, SMTP, etc. because software makers have to conform or die in all these areas. What I'm saying is that we need to be aware of the issue and keep manufacturers honest. Standards come into being in two ways -- people decide on a standard and implement it, or, more commonly, something becomes so widely used that it becomes the standard, even if it's unbearably awful. XML authors are trying to do it right the first time, but they're going to be outmaneuvered if manufacturers are allowed to implement the standards only in the ways and to the extents that they suit them. It will eventually sort itself out -- a toaster that doesn't work with the other appliances on your home network is just not going to sell -- but there will be an initial competitive period that could be dangerously similar to the early days of personal computers, when nothing worked with anything else.

You can help shorten this period by being aware of the standards as they are created and checking that the products you buy are in compliance. If your new cellphone is supposed to use the new name & number storage format but you find that you can't share numbers from your address book with your friend's phones, take it back to the store, and let the manufacturer know that you exchanged their product for someone else's.

How long will it take for the old formats to go away? Will all our devices really be speaking the same language, and how soon? I don't know, but I do know that it will happen faster if we demand it. It's worth the effort, because it extends the ideals of the Internet into all the electronic accessories of our lives. When we can get there, there won't be TI information, Apple information, Windows information, Unix information, or Web information. There won't be information known only to your phone, your car, your Pilot, or your workstation. There will just be information, freely shared everywhere.


Jeff Covey received his degree in classical guitar performance but spent so much time with his computer that he fell in with a bad crowd and ended up working for Andover.net OSDN. He currently works on freshmeat and runs a computer lab for the kids in his neighborhood in his spare time.
http://pobox.com/~jeff.covey
jeff.covey@freshmeat.net


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

[Comments are disabled]

 Referenced categories

Topic :: Communications :: Email :: Address Book
Topic :: Communications :: Email :: Email Clients (MUA)
Topic :: Office/Business :: Scheduling
Topic :: Text Processing :: Markup :: XML

 Referenced projects

The Insidious Big Brother Database - A contact manager for Emacs.
VM - Emacs-based mail reader

 Comments

[»] Nice editorial :)
by Falcon611 - Mar 10th 2003 02:25:30

Hah- And I thought I was the only one who faced the same problem!

I get concerned about things in the long run. After many years of going from different OS's, programs, formats etc, I pay close attention to 'exportability'. I don't use an addressbook built-in to a mail client, because it means I'll have to start the program up to lookup an address, it typically has light, if not nil support for exporting to other formats, and you usually can't open the data files, say if the poo hit the aircon and you were dragging config files out in a recovery procedure. This was the reason I used basic programs or scripts instead of heavy or advanced gui-based apps. Of course, it all depends on the mail client, and various other factors, but most would agree they're only there to save entering the same email address continuously.
Eventually (after searching through old, new, console, and x programs), I got myself a PalmPilot. Initially I didn't think I had much use for them, but after a year and a half of using one, I can honestly say they're great PIM's :)
Refusing to change the 'topic' (there are enough 'I love my PDA' pages out there), I keep all my 'personal info' on my PalmPilot. I use it to store addresses, birthdays (which is simply a program that reads the 'Birthday' fields of the address book, and inserts events into the-), datebook/schedule, general notes (eg email drafts), 'small' databases such as reg codes or moderator logs, passwords (heavily encrypted of course), and lists of todo's (household tasks, stuff I forget to do, etc), to name a few.
It syncs to my desktop easily, so I can enter data both at the computer and out somewhere (very few ppl spend their entire day in front of their own computer)- Underestimatively (is that a word?) useful for the occasions where you want to store or retrieve someones info at, say, a coffee shop or similar.
The desktop client I use has few dependancies, and exports to commonly used formats, so when I eventually move on, or my PalmPilot dies, I wont spend days re-entering everything.
I have a mobile phone, and although having a PalmPilot adds to the `daily loadout weight`, it saves entering large amounts of numbers into an inefficient 15 key device, aimed at showing cute little animations over speed and efficiency. Besides that, I can keep just 1 big address list, instead of looking after a mobile for phone no's, a book for birthdays, MUA addr book for emails, etc.

[reply] [top]


[»] Setting data free with LDAP aks Internet task
by Alan - Jul 21st 2002 15:03:59

Setting data free with LDAP aks Internet task commettee!
by Alan - Jul 21st 2002 15:02:33

Hello all - all these concens were addressed by LDAP protocol check openldap.org, MS AC, Oracle Internet Directory, iplanet=netscape LDAP, NOVEL NDS etc stay cool Alan

[reply] [top]


[»] More standards for data exchange
by Ratface - Nov 23rd 2000 07:36:53

What articles like this (and comments from people such as my boss who is a "typical" computer user) make me realise is that there is definitely a need for open standards that are built upon "lowest common denominator" technologies such as ASCII text. By this I mean markup languages such as HTML or even better, standards built upon XML (as many have already noted in these comments).

One such effort that is underway is the SyncML project which is a cross-company project aimed at defining and implementing a data exchange language based upon XML which would be used for synchronising information such as contact and calendar info across as many devices / environments as possible.

Whether this particular project will come to be of much use or will die an ignominous death is unclear, but my hope is that with time more and more data will be stored in a format that is simpler to manipulate.

My view is that as programming becomes more wide-spread and especially Open Source programming projects, the mystique behind data formats will slowly be broken down. It seems to me that proprietary data formats are a legacy of the closed-software style of development. We can all hope that more sensible choices for data are made in the future and that this approach is adopted more widely with time by even the closed-source software companies.

[reply] [top]


[»] Tarballs
by Ulric Eriksson - Nov 21st 2000 06:55:11

How is a tarball not a single file? That is the way Siag Office has been storing structured documents (documents containing other documents) for years. It works great, the contents can be examined with standard tools and so on.

[reply] [top]


[»] a tarball is <strong>NOT</strong> the answer
by Q-Funk - Nov 21st 2000 06:28:29

Sorry, a tarball does not qualify as a single file, and therefore neither does an XML document bundled with the images inside a tarball.

one, single binary file, please.

[reply] [top]


[»] storing XML and complex binary data in one file
by John Califf - Nov 20th 2000 19:39:25

This already works in koffice. The document is pure XML, but if there is binary data it is treated as a separate file or files appended to the xml (not interfering with it) and decoded or encoded afterward. This is necessary with image data, as it's not feasible to store many megabytes of binary image data in text format with tags. It works beautifully. The whole thing is tarred and gzipped into one file, internally preserving the directory structure that the extra data, usually binary, uses.

There's still a problem with different apps having different DTD's for similar kinds of data (Kde and Gnome, for example) but if both use XML with open formats translating is much, much easier than between XML and a closed file format like MS Office uses.

There is hope yet for better compatibility.

[reply] [top]


[»] XSLT solves DTD agreement problem
by Brendan Macmillan - Nov 20th 2000 17:28:05

You are right that manufacturers are not going to always use the same DTDs. Yes, it's trying to lock-out the competition, and reap monolopy profits, but also natural evolution as well. It takes effort to stick to a standard.

But I think one of the strong points of XML is that the DTD is self-documenting and human readable (and hopefully comprehensible). This means that if two applications/devices use incompatible DTD's, it is generally relatively straightforward to write a transformation expressed in XSLT to convert between them.

It won't make the problem go away completely, but lowers the barriers to data conversion enormously.

[reply] [top]


[»] TEXT ONLY PLEASE
by The Cisco Kid - Nov 20th 2000 14:09:21

I use pine on my own personal linux server (for mail, and notes, any anything else of importance). I can read my mail from pretty much anywhere.

A web-enabled phone would be useless to me, as would one with its own proprietary email system. The only way any such device would be to me would be if it supported TELNET, and a reasonably correct terminal emulation (vt100 would be fine).

The 'one true format' for information should be plain ascii text, possibly in some parseable form. There are damn few places that wont work. Graphics and fonts are great, but should never be REQUIRED to read information, but should always be extra options (optional to the READER, not the sender)

[reply] [top]


[»] re: against proprietary formats
by chardros - Nov 20th 2000 13:23:33

Regarding images and text not being able to be stored in a s single file... this is not a complex problem to overcome. Good 'ol .tar.gz can do the trick can it not? *KEEP* it seperate, thats fine. But treat a document as a tar.gz. If the application can handle the tar/compression on the fly when opening/closing then is this not as good as one large binary solution? With the notable exception of being better as far as some readability from a text editor for the txt portions of the document?

[reply] [top]


[»] against proprietary formats
by Q-Funk - Nov 20th 2000 11:55:56

Every time a potential employer e-mails me something in Microsoft's crappy Word format, they have the unpleasant surprise of receiving a reply in Papyrus format. An eye for an eye, a proprietary format for another; everybody looses.

Lately, proprietary formats for documents is an issue I have examined repeatedly, because while e-mail messages, HTML markup and JPEG/PNG images are well-documented and highly portable formats, Excel and Word are not (no matter what acceptance they have found in offices of Wall Street), neither is PDF, yet those are imposed upon the scholar and technical crowd (not to mention the casual home user) by management staff.

My main conclusion has been that those formats have one undeniable advantage: text and pictures are binarized and result in a single file that can easily be e-mailed and printed out.

By contrast, markup languages (including XML) are very good at document structuring and generation (e.g. using XSLT schemes), but simply do not have the convenience of a single file whose appearance and layout can be controlled in a predictable manner.

The answer is therefore obvious:

This world needs open-source documents formats that match the possibilities of Word and PDF, but whose specs are determined by a panel similar to W3C.

Until such formats exist, Microsoft will be able to shove their crap down everyone's throat, without anyone offering the means to offer any alternatives.

[reply] [top]


[»] Why distribute data over various devices ?
by Goesta Smekal - Nov 20th 2000 10:56:56

What initially came into my mind, as I left WinWorld was: "Why did I use 1 calendar + 1 adressbook + 1 todo list etc... " Why not store all your personal data in one DBMS. I currently work on my personal Postgres DB storing all those things. When I need data in the future I use my nifty (not yet written) web interface to access it from all 'round the globe.

And for the wear-me devices: With GSM and (soon) UMTS at hand (at least in europe) newer cell phones, for instance, are WML enhanced, so they are able to share information over the Net.

IMHO the intelligence should move back to the center again, as powerful communication protocols are available. Distributing data may be nice for security, but sucks when you come to think of the useability. I rather carry a slim, dumb terminal with me, as long as it still can talk to the backend in ten years, no matter what I can do with that backend then ...

[reply] [top]


[»] varying data formats
by rob helling - Nov 20th 2000 10:38:01

Just my $.02:

when i shifted to the *nix world i found it one of the most advanced achivements of 'culture' that almost all data was kept in plain text files that could be edited with a text editor of manipulated and parsed by simple perl scripts. you didn't have to care anymore what the meaing of the n'th bit in the m'th byte in the config files was. Most of the time it was just

key=value # comment to remind me why I set it that way.

and grep turns out to be useful nearly everywhere no matter if the file is .addressbook or mail/folder or .somethingrc or you name it.

There are exceptions, though (let me say sendmail) but most of the time, the problems described in the editorial can be solved with a few lines of perl or emacs. Let's make it all of the time!

[reply] [top]


[»] Setting data free
by Joris van der Hoeven - Nov 20th 2000 10:05:27

Contrary to what you title suggests, you did not talk a lot about the antagonism free/proprietary data formats. I think it is a shame that it is possible to copyright or patent data formats. It is even more a shame that users are not warned about this : how many software packages are there which issue a message like "attention, the GIF format is proprietary and can only be used by ..." each time you try to save an image using the GIF format? In other words, after buying a program, the customer might not be the unique owner of the documents he produces with it, and he is not even aware of this...

[reply] [top]


    [»] Re: Setting data free with LDAP aks Internet task commettee!
    by Alan - Jul 21st 2002 15:02:33

    Hello all - all these concens were addressed by LDAP protocol check openldap.org, MS AC, Oracle Internet Directory, iplanet=netscape LDAP, NOVEL NDS etc stay cool Alan

    [reply] [top]




© Copyright 2008 SourceForge, Inc., All Rights Reserved.
About freshmeat.net •  Privacy Statement •  Terms of Use •  Trademark Guidelines •  Advertise •  Contact Us • 
ThinkGeek •  Slashdot  •  Linux.com •  SourceForge.net  •  Jobs