Wednesday, May 23, 2007

Abacus v. Supercomputer

In my general research field, there are various methods for data acquisition and organization that can be accomplished with technology of various levels of complexity, time, and cost. For some methods, the technologically most advanced methods are definitely the best or only way to go, but there are at least some situations in which there is a choice between a lower-tech and a higher-tech method. This latter situation can set up a bit of a conflict between professors and students/postdocs.

Two examples:

- An MS student has repeatedly questioned why he/she has to use a low-tech method to acquire, somewhat tediously, some data that could be acquired more rapidly with a higher-tech method. I say 'more rapidly' because the actual acquisition time once the machine is on and ready for analysis can be fairly rapid, but this technique becomes much less rapid when the substantial (and tedious) preparation time is considered. In any case, with the low-tech method, you can get data any time you want, and the amount of data one gets is limited only by your time. This technique also has the pedagogical advantage of not being a 'black box' that magically gives you numbers. In the higher-tech method, in which this student is not trained and is unlikely to be trained on the timescale of the degree program, the student has to rely on other people to get the data and will get a much more limited dataset. And then there is the issue of $$. The low-tech method is essentially free; the high-tech method is not. The student is working on a project with limited funds available for research expenses. You do the math, either on an abacus or supercomputer. The plan I worked out is that the student will get some high-tech data (with assistance from other people) using the limited available funds, and will supplement these with however much low-tech data can be reasonably acquired. I think that is a good plan, but the student does not yet see the awesome professorial wisdom of my plan, despite my attempts at explanation. Perhaps the student has been reading too much mythology, in which people are assigned endless useless and tedious tasks as punishment for random things?

- There is another technique that my and other students have a tendency to use for a certain data organization method. This method involves complex and expensive software that doesn't talk to any other software and that occasionally is updated to new versions that don't work with older versions. In some cases, you lose the ability even to see your data ever again. I hate this software. What is more, even if you do everything right, this software does not produce a result that is immediately usable -- you end up copying parts of it to a cheaper and more accessible program anyway. I've had quite a number of conversations with colleagues in which we share stories of all the time and data our students have lost in this software black hole, and how difficult it is to convince them to use simpler but less cool software from the very beginning. My preference is that the data be stored in more than one place -- if some of it has to go into The Software From Hell, it should also be saved in some other, more accessible format as well. Some students react to this as if I'd asked them to enter their data in a ledger using a quill pen and ink made from lampblack.

With reference to mythology again, there are stories of humans being given eternal youth, a mixed blessing if the people around you age and die. Professors rather famously have the opposite situation: we get older and older and our students are always young. And as we get older, it becomes more likely that our youthful students will think we are asking them to use antiquated methods just because we used these methods when we were students.

11 comments:

Anonymous said...

As a graduate student I complained to my advisor about an 'old' method he used and published on. My new-fangled computerized database system should work much better, I said.

He said I was probably right and to try it out - that he did it the way he did as that was all that was available at the time.

Imagine how disappointed I was when I spent a week creating the model and crunching numbers on my fancy computer and I came up with the same results he got using paper, pencil and a balance and 2 hours.

UGH - I was so disappointed to have to go to his office and show him the same results he got. He just chuckled and said that he sure got lucky the first time around. We still laugh about that today.

-Katie

Anonymous said...

My advisor (who got his PhD in the days of physical tape archives, and 5.5" floppies) absolutely insisted that we archive data in two distinct and physically separate locations. Hard drives fail. Labs get flooded. Animal rights people occasionally get rabid. In this day & age of fast data transfer and 300GB external drives, there's just no reason not to do it!

Anonymous said...

Great insight for students in graduate school who have similar experiences with their professors! Thanks FSP! :)

Anonymous said...

hmmmm. .... are you sure you really aren't just making them do it because you had to suffer through it in grad school? I agree about data storage -- but tedious, mind-numbing data collection when a superior alternative not only exists, but is readily available? Sounds like you are utilizing your cheap labor to your advantage -- glad I'm not the chap laborer.

I know of a woman who forces her grad students to do minipreps w/o a kit because, she claims, one should know how it is done. That makes some sense in the first year of grad school, but after that? Sucks to be them. Another good example of why it is so critical to talk to grad students about the PI before joining the lab.

Female Science Professor said...

It depends on how you define 'readily available'. To use the more advanced technique involves other people's time and lots of $$ that does not exist for this project. The student has a choice of getting limited high-tech data, or getting limited high-tech data and supplementing it with low-tech data. The choice is the student's; I have just made a recommendation and explained how much money and time are available for the expensive analyses. If resources were infinite, maybe you'd have a point.

BTW, the student chose this project knowing there wasn't funding for the high-tech analyses. A previously stated interest in the other techniques evaporated in the face of actually having to do the work. That happens. I am hoping that some of that early interest will be revived if the data are as interesting as I think they will be.

Anonymous said...

very interesting subject...There are so many commercial instruments which run proprietary software, the best ones at least allow you to export data as excel or ASCII snd plot it with your preferred software, but even then, it is so hard to get the metadata out of those files!! I just moved to a new lab which has a large number of commercial instruments and it is nearly impossible to keep the data backed up, etc. not to mention that you need a PC for each instrument by different manufacturers. AAARGH! High tech doesn't always save time or produce better data...

Doctor Pion said...

To your student: Do you want to be like the Wizard of Oz? "But I don't know how it (the balloon) works!"

To everyone:
As anonymous2 suggested, are your "backup" copies in the same room as the disk drive that would get destroyed in a fire?

How long before your CD or DVD is as useless as an 8" floppy written in the CPM operating system? [FYI, that would be 25 years ago.] Maybe it won't matter because the data are of limited interest, but see below.

Regarding point #2 in your article: True story about the long-term viability of data storage. A really cool new experimental result (the EMC effect, IIRC) gets published and folks at the Stanford linac realize some old data could shed light on it. They find the data tapes on a rack in a hallway. They are on 9-track tape (the big reels you now see only in old movies). Their computer that could read them is long gone. They ask around. They finally find a Soviet-era "clone" of an IBM360 (already a museum piece at that time) in a Russian lab and process the data and publish the paper.

Rebecca said...

I'm a big supercomputer user myself (working in the high-performance computing field), but there are times when a supercomputer is not appropriate and an abacus is. Just because you have a big hammer, it doesn't mean that every problem is a nail.

Ms.PhD said...

I suppose this is one of those cases where I am on the younger side of things, and partly thanks to my computer-savvy partner.

My personality tends toward the impatient, so the quicker-cheaper method is appealing since I can get answers faster in the short term.

But. When the activity is tedious, repetitive, and needs to be done over and over for hours, days, weeks, months, and years, it starts to make sense to migrate over to - ack! - the computer.

If these software problems affect everyone, it's probably worth the up front investment (maybe everyone can go in on it with you?) to get something better written for your customized purposes, which will be stable, adaptable, updatable FOR THE LONG TERM.

Yes, that's what I said. Spend the money, hire someone, and create a tool. Use tools, feel human!

While it might seem like student-hours are free, they're not. One thing that frustrates me about doing things the 'cheap' way is that it often takes longer and thus actually costs more, over the long term.

Minipreps, as mentioned by one of your commenters, are a great example. DNA preps in general have come a long way since the early days. Doing it the 'cheap' way takes all day- the newest kits have you done in less than an hour, with no loss of quality.

And in our lab, as I suppose is probably true in labs all over the country right now, there's an ongoing battle over using pre-cast protein gels. It's a great generational divide. But to my mind it's simple: companies can make gels, and gels can practically run themselves, but only human lab members can read papers, think, interpret, and invent new hypotheses... isn't our time better spent doing that?

antia said...

This post really resonated with me. With my own graduate students, the battle is convincing them to use tools for their data analysis that are compatible with what other people in the lab use. The problem is that in many cases they'll find a funky software, get utterly confused and then require somebody's time to help them figure it out. Secretly I would just like to force them all to stick to ascii data files and use only multiplatform free software ...

The again, my opinion on these things has changed since I was a graduate student on the other side of the fence.

Becky T said...

Having only recently finished my Ph.D., I must admit that we, the younger generation, rarely recognize the use of "old-fashioned" methods. I generally like to analyze my data on a computer, but I'd always learn something new about it when my advisor took out the pen and ruler. I'm starting to see the light.