Wednesday, June 30, 2010

Google Scholar v. Web of Science

From the comments to yesterday's post:

ISI/Web of Science (WoS) is " better than Google Scholar by an order of magnitude."

"..my citations are definitely higher on Scopus than on WoS."

"Web of Science has a few errors in my records, though not nearly as bad as Google Scholar.."

"I prefer Google Scholar.. My prediction is that WOS will decline in popularity over time unless it makes drastic changes."

OK, so let's do the numbers.

I compared citation data in Google Scholar and Web of Science for 25 of my publications. (I did not search in Scopus).

I looked at a range of publications in terms of publication date, my place in the authorship order, and type of publication. For 18 of the 25 publications, Web of Science counted more citations, so I definitely like WoS better. For these 18, Google Scholar's citation count ranged between 0-92% of the citations in WoS; the average was 62%.

For 3 of the 25, Google Scholar counted the same number as WoS, and for 4 others Google Scholar counted more citations, although typically only slightly more than WoS (84-92%). There aren't enough data for me to conclude anything systematic based on these small numbers, but I was intrigued by the fact that 2 of the publications that had a higher citation count in GS than in WoS were in topics outside my primary research field.

The publications for which Google Scholar did a significantly worse job of finding citations than WoS --i.e., finding <40% of the citations listed in WoS -- were typically in my oldest publications and in my most recent publications, although there is one paper published in 2002 in a mainstream journal for which GS found <40% of the citations listed in WoS.

These results are not surprising; it is not news that these sites are not perfect at counting citations.

These databases are very useful for doing literature searches, and should be used primarily for this purpose rather than as key data in decision-making about jobs, promotions, and awards. Nevertheless, I have been on committees in which various members exclusively used one or the other of these sites for looking up the publication records of applicants/nominees, and I have seen citation numbers listed in many CVs and in letters of recommendation (typically without reference to which citation index was used to determine those numbers).

To some extent, this is OK. A very high number of citations is impressive, whether it is 250 or 320. For some of my papers with more modest numbers of citations, though, I might as well just make up a number between 5 and 50 than rely on the count in either Google Scholar or Web of Science.

Even so, for my field (or subfield) of the physical sciences, Web of Science is definitely "better" at counting citations for most publications. For those of you who prefer GS to WoS, perhaps you could leave a comment indicating your field. Are there particular fields for which GS is better at finding citations?

31 comments:

David Stern said...

Google Scholar records many more (more than double) citations to my work than either WoS or Scopus individually. My h-index according to ISI is 18 and according to Google Scholar 25. I'm in economics/environmental studies. My comment yesterday were about accuracy. There is a lot of variation in how articles from non-ISI journals are cited in WoS but Google Scholar has heaps of completely BS entries. Some of the supposed citations to my papers simply do not cite them at all, others are listed because they appeared in the same issue of a journal or on the same website as one of my papers and all kinds of strange things like that. So it is much noisier (not surprisingly given its methodology) than WoS and Scopus.

Estraven said...

For pure mathematics, GS > MathSciNet >> WoS. One of the factors may be that lots of papers are quoted as preprints, as typically the interval between arxiv appearance and publication is one year or longer.

Also, wrong papers tend to gather record amounts of citations very fast, so citation numbers are never taken seriously - we judge researchers on a combination of reference letters and international recognition.

Rick said...

I'm in a strange field, but it is relatively close to Computer Science. One of the weird things about computer science is that the majority of top venues for publication are conferences, not journals. As in, the best research is published in archived conference proceedings and never in a journal. Therefore, papers at top conferences are enough for tenure at most computer science departments. WoS doesn't index these conferences (but does index some of the less-prestigious computer science journals), so Google Scholar tends to have much higher citation numbers for my work.

Rob1606 said...

GS >> WoS - I am in Theoretical Computer Science.

Below I list my top five papers according to GS and the percentage of citations that WoS finds.

1. 20%
2. 50%
3. 0%
4. 10%
5. 16%

Anonymous said...

Yes! Google scholar is much better for humanities. For example, in linguistics many journals aren't even indexed in Web of Science. WofS basically only included psycholinguistic and neurolinguistic journals. But if you do, e.g. Semantics research, "Journal of Semantics" & "Linguistics and Philosophy", two key journals, these aren't indexed so you are out of luck. Further, Google scholar includes citations to conference and workshop papers. These are critical in some newer fields like Artificial Intelligence and Computational Linguistics. This is where the new results appear. Eventual journal papers are often reworked conference papers of "old" results. And for many liberal arts fields people are still focused on book publishing. For those fields Google scholar offers a relatively objective measure of impact. This is a welcome innovation.

Anonymous said...

My fields are scattered, though most of my citations in WoS are for my latest field (bioinformatics). The differences between Google Scholar and Web of Science are dramatics, with H-indexes of 31 and 21 respectively. The biggest difference seems to be the conference articles from when I was in computer engineering---Web of Science ignores them, but Google Scholar finds 113 citations for one of them, and several others contribute to the h-index with citation counts over 31.

The Web of Science has 41 papers for me (including 4 with 0 citations) but the "distinct authors" set only attributes 26 of them to me. They claim I can fix that with ResearcherID, but I've had 55 papers there for about a year, and they've not fixed it yet. Also I find their web interface to be so painful that I gave up on it. Google Scholar claims 205 papers, but the correct number is around 85, counting tech reports and other informal publication, so Google has failed to merge a lot of the citations. Some of the low-count citations are for papers that never existed, and some are by other authors with the same last name which Google has mistakenly put with me. One is for a conference paper that I had forgotten about (I never put it on my CV, and don't even know if I still have a copy).

The most cited paper is the same in both cases, with 859 for Google Scholar and 596 for Web of Science.
Some of the difference is double counting by Google, but I don't think that all of it is. Web of Science tends not to count most conference citations. One of my early papers is in a field not much indexed by Web of Science (computer music) getting 64 citations in Web of Science and 251 in Google Scholar. Of course, searching for the name others have given to the technique gets about 168,000 hits on regular Google---the computer music field is not much given to academic citation.

Towards the bottom of the list, Google Scholar has a lot of incomplete citations that have not been merged with the correct citations at the beginning of the list, and some ludicrously wrong citations. I checked one of these weird citations---it was from an unpublished multi-author paper on the web, and did refer to my work, but in the wrong journal and with volume and page numbers that were not plausible for the journal they selected, so just students too lazy to look up a citation.

Anonymous said...

I did a more thorough analysis of my citations in
http://gasstationwithoutpumps.wordpress.com/2010/06/30/google-scholar-vs-web-of-science/

Anonymous said...

For Plant Biology GS typically gives 30% more citations than WoS.

Anonymous said...

In computer science GS finds a lot more than WoS but I suppose that's to be expected...

Anonymous said...

My institution just canceled our subscription to WoS and now we only have Scopus. I'd be interested to see how Scopus compares. Has anyone done a comparison similar to the thorough approach FSP took?

Anonymous said...

Prompted by your post (and the need to procrastinate on working on something I am not wanting to work on), I did some comparing myself (I hit the "ignore patents" button to make the searches more comparable). I am in biomedical science. The differences were very large, and not necessarily matching the trend you saw for your pubs. Most striking, the order of my top cited pubs was quite different in the two lists. For my top cited publication in GS, which was second in WOS (it's a review--shows how much these citation counts mean for the quality of my work) the difference was striking. 909 for GS and 715 for WOS. Top pub as listed in WOS (at least this was a real scientific paper), 774 in GS and 722 in WOS. However, GS then significantly undercounts my next highest one, which is an OLD paper by today's standards (1983), with GS at 336 and WOS at 498. The next couple papers are more similar, only 20-40 citations different, with GS again having the higher count.

I then jumped to the 5th page (citations in the 50s). GS boosted citations for some papers from the 40s into the 60s. However, it also undercounted at least some papers from the 1990s.

Going deeper, GS boosted some papers from 15 to 30 citations--maybe folks are reading those after all.

Bottom line, either they are counting different sorts of "publications", or they each have serious flaws in their counting mechanisms. In term of ego boosting, however, my h factor differs by only one between the two-strangely they errors balanced themselves out, at least in my case. Not statistics but...

Mark P

Anonymous said...

according to wikipedia:
The topic has been studied in detail by Lokman I. Meho and Kiduk Yang.[5][6] Web of Knowledge was found to have strong coverage of journal publications, but poor coverage of high impact conferences. Scopus has better coverage of conferences, but poor coverage of publications prior to 1996; Google Scholar has the best coverage of conferences and most journals (though not all), but like Scopus has limited coverage of pre-1990 publications.[6]

Amanda said...

Yes. See the following article in the May 2010 Computing Research Newsletter, about the impact on NRC rankings of their choice to rely solely on ISI data for citation counts, even though ISI data is very inaccurate for CS:
http://www.cra.org/resources/crn-online-view/dangers_of_rankings_with_inaccurate_data/

Anonymous said...

Field: Computer Science

Highest cited paper

GS: 200+ citations
WoS: 20+ citations

Anonymous said...

Does anyone like Scifinder? I'm in chemistry (it's run by Chemical Abstract Services). I haven't used WOS very much because I really like the Scifinder interface. I've never looked at the accuracy of citations, h-index, etc.

Anonymous said...

GS gives more citations in biochemistry and plant biology, except for the most recent papers. It also lists my PhD dissertation, which is online.

EscapedWestOfTheBigMuddy said...

Particle physics here:

* The primary resource in the field is spires which I use to compile the list of publications on my CV
* Google enormously under-reports citations for one of my better known papers (an order of magnitude!) but seems to be fully up-to-date and some others, most papers seem to be present at least
* WoS also misses a great many of my citations though it does seem to have most of my papers (some listed twice in two different volumes of the same journal)
* Didn't try scopus

Anonymous said...

I can't even find myself in WoS (computer science) but I have a lot of citations in google scholar!

Anonymous said...

I clearly am procrastinating as the comments prompted me to look deeper on the GS list. I now see:

1. that Ph.D. and undergrad theses and other non-journal pubs are also included, so that explains why some articles have much higher citation counts in my case (biomedical science)

2. Once I set up the search right, in my case GS did a very good job of sorting my real pubs (and those of the other folks with my relatively rare first initial and last name) from those that simply have a citation to me and thus also include my name. There were 154 total hits and probably 140 of these were real pubs by me or one of my namesakes. There were a few odd citations where someone in another country with a non-Roman character set must have posted a paper of mine, as well as a "In this Issue" type pieces that mentioned one of our papers.

Mark P

The Lesser Half said...

In the Earth sciences, conference abstracts are usually not considered publications, but google counts them, meaning that much vetting must be done to get a good count.

Recently I compared Scopus, WoS and GS in calculating my H index. None of them had all of my publications (except GS, which also had dozens of abstracts), and none of them had my H index right. So now I've taken to tabulating my references myself to make sure they (and my H-index) are correct.

Female Science Professor said...

So there are lots of Google Scholar fans.. where are the Web of Science fans?

Alex said...

If my school shelled out for Web of Science I might have an opinion on it. The last time I was at an institution that had WoS was when I was a postdoc and did not yet have much of a citation history to check.

I do know that Google Scholar has missed some of the cases where I've been cited.

Ursula said...

I prefer WoS. There are three other scientist with the same name, and WoS can group publications by what it thinks are unique authors. A couple of publication of mine are missing in WoS, but it's much more accurate than GS, who attributes about 40 additional publications to me. The h-index at WoS is also a little higher.

Google Scholar has found 285 citations for one article (that is wrongly quoted, they used the online pre-print citation), and WoS only quotes 19 for the same one, so Google Scholar "looks" better, but they found many citations to this article that have been written years before the article got published.

Fields: Chemistry and Structural Biology

GMP said...

So there are lots of Google Scholar fans.. where are the Web of Science fans?

:) I am (sort of). I am in the physical sciences part of the STEM spectrum, and I find that my h-index is the same in WoS and GS. GS does not pick up all the citations in journals, my guess is proporietary/access issues (my most cited paper has only about 60% of WoS citations show up on GS). GS does a good job of counting citations for some journals that WoS may not index (they are new or not indexed for whatever other reason), papers in conference proceedings (as others have said several times), as well as picks up citations of any work in people's theses and conference abstracts which obviously WoS doesn't track.

In my field, where important results are published in journals, I find that WoS does a very good job of counting them (they also count citations that appear in certain conference proceedings, such as those published by some professional societies).

Of course, I have nothing against GS; if you really want to see who is following your work, you can use GS as a preview of WoS citations to come (e.g. it will pick up on citations in arXiv preprints which do eventually show up on WoS after those papers are published).

So I prefer WoS, but for other reasons too; for instance, the university nicely links through WoS to multiple publishers so it's a convenient route to downloading cited references. Let's not forget the ISI Journal Citation Index (impact factors): for my tenure case the impact factors of all the important journals in my field had to be collected and listed (was a requirement), and number of citations of all my papers published on TT had to be listed (both WoS and GS), and I was told by colleagues that at least a couple of people who wrote letters for my TT case talked about my h-index. So yeah, people are embracing the metrics with open arms in my department/field.

EliRabett said...

Anne Wil Harzing has created Publish or Perish for those who wish to waste even more time looking at what is on Google Scholar

http://www.harzing.com/pop.htm

Anonymous said...

Another particle physicist -- Spires is priceless for doing literature searches. I'm surprised more fields don't have their own specialized, very up-to-date databases.

Google Scholar and Web of Science both appear to drastically undercount citations to my papers. Part of this may be because Spires indexes preprints, not just publications.

Perceval said...

Yes - what the other computing scientists / linguists said. It depends a lot on the fields covered by WoS. My "most prestigious" paper is cited a lot as preprint, because linguistics journal papers tend to circulate as manuscripts for a long time before publication - and there are extremely influential publications that still haven't gone beyond the ms stage!

Anonymous said...

Well after reading all these comments I had to check...in my field WoS seems to be the standard so I have not really used GS before this.

My experience seems to be more similar to yours - i.e. GS counts less than WoS. But I was rather shocked by how consistent the difference was - of my top 10 most-cited papers nine of them had fewer citations in GS than in WoS. The tenth one, which has 163 citations in WoS, did not show up at all in GS - I searched and searched and could not find it. It's in a relatively prominant journal (JACS) so I don't know how to explain this.

Also I found that in GS my h-index is 11, versus 16 in WoS. I can't believe the difference - I'm an early tenure-track assistant professor so a difference this size really matters - I hope no one is using GS to judge my productivity!

For the record: I'm in materials science (some crossover into physics) and I don't normally list conference contributions on my CV - GS did find a bunch of these for me but none had very many citations. I think in my field these are not normally counted in e.g. tenure decisions.

So I guess now that I've checked, I'm definitely a fan of WoS. Interesting exercise, and interesting to see how much it depends on field!

Dave Backus said...

We're trading anecdotes here, but in economics it's not uncommon for WOS to miss out on 40-50% of the citations. We know that, because my junior colleagues track them one by one and compare to a reasonably intensive WOS search. Certainly typos play a role, but I can't tell you why there's such a big difference.

Another issue is what we're measuring. GS picks up things like PhD reading lists, which seem to me reasonable signs that the work is getting read.

I don't think there's a magic bullet, but we should (collectively) think about what it is we want to measure on how to do it. There's a series in Nature on the subject, captures a broad range of opinions, but not sure where it leaves us.

Anonymous said...

I compared my citations counts in Google Scholar, Web of Science, Scopus, and SciFinder (and uselessly in MathSciNet and Spires) in
http://gasstationwithoutpumps.wordpress.com/2010/07/01/google-scholar-vs-scopus-and-scifinder/

Anonymous said...

Field: Computer Science (AI).

Google scholar is much better for CS. Most people just focus into publishing in conferences and many of these are not indexed by WoS.

My top cited paper: 53 according to GoogleScholar, a mere 5 for WoS.

My AAAI papers (a top conference, considered better than practically any journal) do not even appear in WoS !!