Wednesday, May 19, 2010

Good For You

When reviewing an NSF grant proposal, the choices for rating the overall proposal are:
  • Excellent : Outstanding proposal in all respects; deserves highest priority for support.
  • Very Good : High quality proposal in nearly all respects; should be supported if at all possible.
  • Good : A quality proposal worthy of support.
  • Fair : Proposal lacking in one or more critical aspects; key issues need to be addressed.
  • Poor : Proposal has serious deficiencies.
The overall rating includes evaluation of both Intellectual Merit and Broader Impacts components of the proposal, but for most Science proposals, the Intellectual Merit score is typically the one that drives the overall rating.

Obviously it is important to get lots of Excellents, but it's possible to get a grant with some Very Goods. In some cases, it is possible to get funding with mostly E/VG but also a Good (or worse) if the negative reviewer amply demonstrates a lack of objectivity, knowledge, or sanity.

As is well known by those who have submitted an NSF proposal, a rating of Good is Not Good. Reviewers who interpret Good as indicating "a quality proposal worthy of support" are typically those outside the NSF system who think this description actually means what it says. It does not. Good is Bad.

A reviewer who has serious concerns about a proposal or a PI can do a lot of damage with a rating of Good, but Fair and Poor are the real killers. When is a rating of Fair appropriate? When is a rating of Poor appropriate? (Let's assume that the reviewer is objective and is not downgrading a proposal for evil reasons.)

In ~ 20 years of reviewing NSF proposals, I have only given a few ratings of Poor. From what I've seen on panels and other NSF committees, this is not just because I am nice; a rating of Poor is rare in my field of science. Also, I am not particularly nice.

For a proposal to be Poor, it has to be so bad that there is no hope that the proposed research will be successful in any way. A dull but otherwise solid proposal (i.e., "We are going to get some data but we don't really know why") is not Poor. Such a proposal might be Fair or Good, depending on whether anything of use or interest could be salvaged from such a study. A Poor proposal is apocalyptic. A Poor proposal has no redeeming qualities.

I have given more ratings of Fair than Poor over the years. Fair (in my opinion) indicates a flawed proposal but not one that generates feelings of anger or disgust at the stupidity of the proposed research. A Fair proposal involves poorly defined ideas and/or inappropriate methods. It is possible, but unlikely, that the research would result in anything useful or interesting, but awarding the grant would not be a major travesty; it would be a semi-travesty.

What if a proposal is not well written or organized but the research is not as bad as might seem based on how the proposal is written? That can be a difficult situation to assess. Does the bad writing and organization indicate a problem with the research design or implementation? Or are these aspects technical flaws that mask the excellence (or very goodness) of the research?

In these cases, I make a decision based on the research, but I may mention in my review the aspects of the proposal writing/construction that are problematic if it seems that these might impact the research. A proposal is not a manuscript, however, so I don't do any pointless technical editing.

If the PI has experience with writing proposals, the only technical flaws that are important to note are those that might indicate a problem with the research. An inexperienced PI, however, might benefit from advice about proposal writing; ideally this advice will be constructive and not patronizing.

Most proposals that I review are Very Good. A few are Excellent. More are Good or Good/Very Good (although split ratings are kind of lame). Some are Fair. Rare ones are Poor.

These ratings are important for determining whether a proposal is further considered, but an essential aspect of the review is the comments, which should match the rating. I recently read some reviews of one of my proposals and there seemed to be no correspondence of the comments and the ratings. Some of the most negative ratings had very positive comments; it was odd.

Just as with reviewing manuscripts, though, I know that my opinion is just one part of the process. If I rate a proposal as Poor or Fair (the equivalent of recommending rejection of a manuscript), I am not making the final decision to reject the proposal (or manuscript). I am providing my opinion, and this will be considered along with other input from other reviewers and panel/program officers (or editors, for manuscripts).

If my opinion is honestly given and is based on a careful reading of the proposal (or manuscript), then I don't feel bad about giving a negative rating. I might feel sympathy for the PI (author), but I was asked for my opinion, and I gave it, explaining the reasoning for my negative opinion. It's up to others to evaluate my review along with the rest of the review information and make a decision.

30 comments:

Comrade PhysioProf said...

In other news at the cutting edge of the scientific enterprise, researchers report that water is still wet.

(Yeah, I know. Posting every single motherfucking day, Monday through Friday, is fucking hard.)

mOOm said...

So if good isn't good then this is another case of "grade inflation"?

I also see something like this in job advertisements (at least here in Australia). Everyone seems to advertise for "outstanding" people. But the people on their faculty are mostly not outstanding in my opinion in an international context. So maybe they don't really mean "outstanding"?

Anonymous said...

I once sat on a panel with a Canadian dude (i.e. someone outside the NSF system). He rated everything at least one grade lower than the other reviewers. He kept saying "that's how we do it in Canada", which may be true, but it was clear he was going by the labels themselves rather than what they're intended to mean.

Female Science Professor said...

For those of you bored by this post, I should have mentioned that I had a request by e-mail for a discussion of this topic. At least one person will be interested, perhaps.

GMP said...

@ComradePhysioProf, why the snark?
I am sure there are a lot of junior faculty who are trying to learn the ropes and are are not sure what it takes to get funding from NSF. It does appear to be a nearly random occurrence sometimes... I think the post is certainly relevant.

I think acceptability of Good depends on NSF division. In some divisions you hardly ever get any Excellents, people simply never give them; a proposal with most Very Good will be funded. I must admit I rarely give Excellents myself, I also almost never give Poor (there was a 2-page proposal once I obviously had to rate as such...)

What baffles me is that sometimes you get these outrageous spreads of ratings (from Excellent to Poor on the same proposal). That's just obvious BS, as a proposal must have an objective worth. Some of the program managers I have worked with will actually work with a panel to get the ratings to converge around a rating. I wish more were like that.

And I feel FSP's pain on cryptic rating-low, remarks-positive kind of review. As with paper review, this is yet another demostration that some people take prosal reviewing far too lightly...

Monisha said...

Whether there is an 'objective' value for any research proposal is an interesting question...that's not to claim that all is subjective, but when it comes to issues of prioritizing research areas and kinds of data within a field, surely some non-objective things become relevant? Or perhaps that is more the case at NIH than at NSF.....

Odyssey said...

The overall rating includes evaluation of both Intellectual Merit and Broader Impacts components of the proposal, but for most Science proposals, the Intellectual Merit score is typically the one that drives the overall rating.

For any young faculty who read this, the above does not mean Broader Impacts don't count. A proposal with outstanding science but poor (or non-existent) BI's will not be funded (at least in the biological sciences). The NSF is very clear about this and I've seen it put into practice on more than one occasion while serving on review panels. On the other hand, excellent BI's will not rescue "good" science.

Anonymous said...

Don't worry about CPP. Half his posts are clip art of brooms, for crissakes.

Good is good. But if only 10% (or less) of proposals are funded, good is not good enough.

Sometimes when I am especially worried about anonymity (because reviews are not anonymous to panel members, and word does get out), I will write relatively harsh comments (aka my honest opinion) but give a more generous score. It's my weasely way of shifting responsibility to the panel.

Gerty-Z said...

I remember the first fellowship application I wrote. It was for the NIH which, at least for NRSA applications, has a similar scoring system. I was pretty excited by the "very good" I was given-until my advisor basically told me that very good was actually NOT. It was more "average". I ended up getting funded (barely-I was the last one above the line). I don't understand why, if you are going to assign designations, the labels can't be more transparent.

Anonymous said...

I tend to agree that this must vary by division. I received 3 Goods and 1 Excellent on an ecology proposal, and it was ranked as Meritorious by the Panel. According to the statistics provided, fewer than 1/2 the proposals received were ranked Meritorious or higher (the rest, over half, were ranked as "Not Competitive").

Of course, only the top 15% were funded, so good isn't good enough (but I don't think it's BAD).

As a reviewer, when I rank a proposal as "good", I'm saying to the panel, don't fund this now, but here are a bunch of good comments for the PIs to use for their resubmission. When I rank excellent, I'm saying this is novel, timely, and these PIs have their shit together - fund it NOW. A fair or poor is my way of saying, don't bother resubmitting these ideas.

Anonymous said...

"but it was clear he was going by the labels themselves rather than what they're intended to mean."

Isn't a system in which the labels don't mean what they're intended to mean somewhat broken? Maybe the labels need re-naming?

Anonymous said...

I am a Canadian who was approached for a review by the NSF before (had to decline on account of being the mom of a 1-week-old at the time). That being said, when asked again, I would only have the information that the NSF provides. So I would be inclined to rate Good as Good, unaware that Good is Not Good Enough. Needless to say, I find this post very interesting. It may be your grant next time, CPP (insert smilie here), so be grateful to FSP for informing people like me on the workings of the NSF.

I do find it interesting (and frustrating) that the system seems to require people to outmatch each others superlatives, to the point of making these rankings (or reference letters for that matter) mostly useless.

Douglas Natelson said...

It's always nice to get a sense that my own rating scale coincides well with someone else's. FSP, I think we have very similar attitudes about this. The only "poor" rating I ever gave was at a panel for SBIR proposals, where it was clear that the investigator had no idea at all what constituted a proper NSF proposal.

Adding to what Odyssey said up-thread, I refuse to give an E rating to a proposal that has a one-sentence "Broader Impacts" section. I don't expect everyone to reshape the nature of K12 education for the underrepresented, but at the same time, at a bare minimum I do expect people to think about how they can plug into existing efforts at their institutions.

The Lesser Half said...

Anecdotes:

- A proposal with 9 reviews (not funded). Seriously, how do you get nine scientists to agree on anything?

- Proposals with a range from poor to excellent (happened twice, once funded, once not)

- A proposal labeled as both a "bargain" and "too expensive" (funded)

- Consecutive proposals on roughly the same topic denied for being either "too far-fetched" or "too incremental".

lost academic said...

FSP, do you have any experience sitting in review of the NSF GRF applications?

The Lesser Half said...

Oh, I forgot one:

- European colleague's comment on my proposal: "It was fantastic, that's why I gave it a 'good'".

When I explained the system to him, the look on his face was priceless. Lesson learned.

Anonymous said...

Ratings vary by directorate as well. CISE (computer science) has the overall lowest rankings. On CISE panels, you see many Fair and Poor rankings. (I admit it, from me as well.)

Anonymous said...

As an undergraduate who will apply for the NSF GRFP, it's good to know that it's not just newbies to the system that find it opaque to say the least (the gradcafe has some fun stories about the comments not matching the ratings.)

I understand that the people who review for the NSF have things to do, but at least make sure that you aren't calling the person the wrong name in the comments, or you make comments that aren't correct for the profile. Why? Because you should take some sort of pride in your work.

chablooi said...

HEY thanks this was the kind of discussion I have been looking for in the blogosphere.

Recent NSF grant that got rejected had 1 excellent, 2 very goods, 3 goods, and one fair.

The "fair" reviewer listed almost 5 full pages of detailled comments that were mostly minutia and a checklist of items that we had to do to prove that we are worthy of getting funded. I have never had a review like this. It was very poisonous commentary -- so much over the top and there is no way to overlook that it was quite rude. One of my favorite quotes from that reviewer: "What might be a proposal to do highly transformative work is bogged down by a total lack of citations that reference the admirable work of others."

I feel our work has been entirely sabotaged by this process and have placed no fewer than 4 phone calls to the program manager to ask for clarification. Normally are proposals reviewed 9 times? How can other reviewers have such glowing things to say?

After 4 phone calls, and 6 emails - I still have yet to hear from the person in charge. any advice?

Anonymous said...

the panel will read the actual review, not just the E/V/G/F/P score. I've seen often proposals given E with a so-so review (or a review that does not explain why the excellent score was given), and the panel will basically disregard the E; or v/G scores being bumped to E if the reviewer was being too stingy with the score.

Anonymous said...

On the panels I have been on I would say that good was not "bad" rather good was a stand in for "ok" if there was unlimited money we would fund this but given that money is limited and there are so many better proposals its not fair or poor but sheez can't you do better....

I agree that fair and poor are the kiss of death with poor being a stand in for why did you ever send us this...please I don't want to see it again....

Anonymous said...

Isn't a system in which the labels don't mean what they're intended to mean somewhat broken?

Nah. We understand the meanings through context and interactions with others, just like all uses of language. Any set of labels will eventually get redefined from their "true" meanings in some way.

That being said, when asked again, I would only have the information that the NSF provides. So I would be inclined to rate Good as Good, unaware that Good is Not Good Enough.

Sure, but once you're at the panel and having the discussions, you would probably upgrade your ratings to match those of your fellow panelists. The Canadian dude that was on my panel basically refused to do so.

Female Science Professor said...

A proposal must have at least 3 reviews for NSF to take action, although of course it is better to have more. A program officer may send out requests to many possible reviewers, and if they all send in reviews, it is certainly possible to get 9 or more reviews.

zed said...

I've had 9+ reviews many times. Have also experienced wide variation between disciplines, since I work at the boundary of a few disciplines. Each sub-group has its own interpretation of these ratings, and their own style of reviews.

John Vidale said...

Nice commentary. Nothing controversial, as CPP pointed out. And no, the adjectives do NOT have to match the meaning of the review.

A final note - the rankings of the review do matter. Although only the ratings of the panel matter in the end, the panel tends to follow the initial rating unless they strongly disagree. So the effect of ratings inconsistent with the words in a review have an uncertain effect.

Anonymous said...

To the person asking about GRF:
I applied for the GRF 3 times and didn't get it. I applied for a DDIG and got it first time out. One of the critical things? On my GRF's I got knocked for not having good enough Broader Impacts (my plans were vague), but in my DDIG, I had specific plans addressing broader impacts. In my experience, Broader Impacts are critical, even for a GRF. Come up with something specific you COULD do, it's ok if it changes later.

Comrade PhysioProf said...

It was very poisonous commentary -- so much over the top and there is no way to overlook that it was quite rude. One of my favorite quotes from that reviewer: "What might be a proposal to do highly transformative work is bogged down by a total lack of citations that reference the admirable work of others."

How is that "poisonous" or "rude"? Was the reviewer correct that you failed to cite important work of others?

I feel our work has been entirely sabotaged by this process and have placed no fewer than 4 phone calls to the program manager to ask for clarification. Normally are proposals reviewed 9 times? How can other reviewers have such glowing things to say?

Like all peer review, grant review is a subjective process. This is why more than one review is relied upon.

After 4 phone calls, and 6 emails - I still have yet to hear from the person in charge. any advice?

Move on, and start writing a better proposal for your next submission.

Anonymous said...

You might also have 9 reviews if your proposal is read by two different panels. In my interdisciplinary area, it's not uncommon for two panels to review proposals.

Swede said...

Most proposers work pretty hard on their proposals and make a decent case. So most of the proposals are in fact "good," in my experience. Then the NSF rating of Good means exactly what it says: worthy of support - but some others are better. Both the words and the placement in the scale communicate intent accurately.

For NSF, it is useful to have more discrimination on the top end of the scale. The written comments should justify the discrimination and tell the proposer where to improve. But there is no need to carefully distinguish the paths to Fail: most proposals that I rate Poor or Fair clearly did not get the same level of effort as the Good ones, and it's not worth my time to scold them or tell them they left out what they already know they left out.

One exception is when I review for a particular program that draws a lot of inexperienced proposers (purposefully, in aiming to broaden the fold). Reviews that educate are explicitly requested by the program officers. Here, the problems in low-ranked proposals tend to be ones of omission, so I point out what is missing and why it matters. The errors tend to be similar, so Cut and Paste are my friends.

chablooi, the Fair review may grate, but to my eye the rest of them are not strong enough to carry your case. Suck it up and revise.

John Vidale said...

Good has little meaning as an absolute measure of a proposal. If there is only enough money to fund 1/3 of a set of proposals, the other 2/3rds are, by the only possible definition, not good enough.

Few proposals have no chance of discovery - even if one knew the proposers would do exactly what they propose, which few do. It's econ at work, supply vs demand.

Calling not good enough proposals good seems designed as balm for egos and ammunition for internal agency funding shifts and pleas for increased budgets overall. Occasionally it is a legitimate tactic for the possibility that more funding suddenly appears, such as those stimulus funds.

Some comments from reviewees remind me of undergrads who don't understand how grades are curved. I guess the greater danger is of reviewers and panelists who don't understand curves, akin to profs who refuse to curve their tests.