Data Management

Personal Space

A little late to the party, I've been having a play with Evernote. A lovely concept, quite nicely executed, but the two usual questions arise. Firstly, will I still be using it in two weeks, and secondly, where exactly is my data? Actually this second issue was pointed out when I asked a colleague if they had used it, and this was her primary concern and the reason it never made it out of the sandbox for her. Generally, I'm fairly uninterested about what data of mine is being stored, and where. But that depends on the data of course, in this case we're talking a few articles about aikido, bio2rdf and a snapshot out of my office window. When it comes to storing much more pertinent information though, I too would want to get some straight answers about where exactly it is, how it's stored, encrypted, backed-up and all-round looked after.

I'm a big fan of Dropbox, I have a personal account and the Knob Jockeys have a premium account for storing their music data (and use 90% of the fifty-something gigabytes available). By actually paying for a service, you definitely get the feeling it's slightly more secure and safe than using a free service, whether or not that's true is hard to know. Of course with Dropbox, our music data is also backed up by being on multiple machines, two of which are backed up onto Time Machines anyway, and so in worst case scenarios, such as the company going under, it wouldn't be a huge problem anyway.

Any company or largish storer of data will of course have a dedicated provider, and a contract that covers unfortunate eventualities. I think it's time that this sort of service was available for the average man and his digital photos of dogs.

Such a service would have to have the cooperation of data management services, not be the data management service. You want cold, hard, reliable storage, nothing more, nothing less. Oh, and an API so all of your other services can use it, of course.

I don't think it's rocket science. I'm not talking about a perfect amalgamated, semantically described data model. The digital equivalent of a little out-of-town lockup. It could work like memory cards in consoles used to - allocate a couple of slots to Facebook, one to SlideShare, three to Flickr, and so on. These sites would probably have a cache of your data themselves, but they could use your own storage for the main copy of the data, or at least a backup. It wouldn't even have to be in a format you could understand yourself, although users might come to prefer the ones that did. Services that currently try and scour the web for your data and consolidate it could just run on your personal cloud.

Imagine using Flickr, Facebook and Evernote knowing that a copy of all the data they hold for you is also stored in your own personal locker? Now wouldn't that be worth paying for?

Got to have a system

I need a way to organise and manage my scientific literature. Doesn't everyone? Handily, we're thinking of building one. Perhaps the most blogged topic on the internet, bar blogging, is how to organise. Organise, plan, approach, manage - it almost always boils down to the same thing: having a system. In the timeless words of Dr Hill, "you've got to have a system". Whatever scale of task is before you - organising your photos, planning your tasks, approaching your work, managing your life - there's certainly no shortage of systems to adopt to aid in your quest. Systems based on simplicity, complexity, combining, compartmentalising, mixing-it-up, not-having-a-system, there's every kind out there.

Once you've tried a few of these systems, you begin to cotton on to something. This may be generalising greatly, but in almost all cases, each system a) has some great qualities, and b) isn't perfect. One of the wonderful things about free will, however, means that we don't have to just follow one system: we can roll-our-own. Find someone of the appropriate personality type (you know the type I mean), and ask them how they organise their academic literature. Or their cutlery drawer. Or how they make coffee. Or why they do that first, and put this thing here and that one over there. These sorts of people love systems. They love to have their own systems, that they've tailored over countless iterations to suit a task perfectly. And low-and-behold, quite a large proportion of scientists are exactly this type of person.

And so it comes to finally organising your academic literature (just this one last time, you tell yourself), and what to do? Well, you would pick a system, but there are just so many out there. You'd use your old system that you used to use for organising your physical papers, but it just doesn't quite fit, and there's still highlighter pen on your screen from last time you tried. So what's better then coming up with a system? Picking a service that gives you a system! Yes, that's what we do, we look for the latest lovely application or website that promises to do all that organising for us, so we don't have to.

Playing with countless applications (Papers, Mendeley, BibDesk, CiteULike...) that organise your papers in their own brilliantly clever new ways is great. Really. And you soon find that they all do it in a way that feels not quite how you'd have like to have done it, if you'd written it yourself. Simple: build your own application that does things in just the way you like. What? No one else finds the application as useful as you do? How peculiar other people are. So how do we get around this? What should the application do, if not supply the user with a system? I think it should let you use your own system.

In some fields, an application gets to a point where it is no longer an application, but a tool. Take Photoshop. Ask three graphic designers how to perform some task in Photoshop, say cutting out a foreground object and adding a shadow, and you'll be shown three completely different but valid ways, such is the flexibility and comprehensiveness of Photoshop. The application has become a tool that facilitates each user's own way of working.

A digital literature manager that is going to hang around for any length of time needs to be like this. It needs to provide a straightforward way for one set of users, the minority that just do what they're told, to organise their papers. And for the other set of users, all those pesky scientists who like to have it their way, it needs to facilitate them devising and using their own system of organising their papers.It needs to be lightweight, simple, and not get on the way. It needs to be a tool, not an application.