| How
to cite: Mandoli, DF 2004 The Bioethics Imperative XV. Ethics
and the Literature: Citations III. ASPB News. January/February,
31(1): 9. http://www.aspb.org/newsletter/janfeb04/08mandoli15.cfm |
BIOETHICS
The
Bioethics Imperative XV
Ethics
and the Literature: Citations III
Mokita:
the truth we all know and agree not to talk about.
Scenario:
Frank Lee Nayeff has just cloned his first gene. He eagerly searches
GenBank for homologies and finds hundreds of partial matches.
He is bewildered by the various functions of the genes with homology
and goes home late that night discouraged. Fortunately, the next morning,
his adviser shows him that if he strips off the vector sequences, the
confusion is resolved. Frank trots happily to the library to search
for older references on the physiology of the protein that he has cloned
and, a month later, submits a manuscript on his clone and the 20-year
history of the physiology of this protein. He and his adviser are both
humbled by reviewers comments that the literature on this important
protein actually goes back 50 years, a body of literature that Frank
missed because of changes over time in the terms used to describe the
physiology involved.
How many databases
do you use? Do you know if they are curatedthat humans have processed
each incoming file for accuracy? How is each database organized and updated?
When you search several databases for the same information, do you adapt
your terms and strategy to match each databases requirements; does
a small or zero retrieval mean that you missed something, or was there
actually little or nothing to be found?
In molecular biology,
a curated database makes a difference in the quality of the information
retrieved because bad data can mislead, distort, or lead to serious errors.
SwissProt (http://www.ebi.ac.uk/swissprot/index.html)
is curated, but GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html)
is not. In a research apprenticeship course, we had students search GenBank
for a common vector sequence using Sequencer. In minutes, we found
hundreds of examples of vector sequences. Obviously, the genes had not
been stripped of the vectors used to clone them prior to submission, and
we had a chuckle over some of the famous people who had entered vectorssome
without insertsin GenBank.
Another data retrieval
concern is that a search term might not exist prior to a particular date.
For example, until 1983, the term AIDS did not exist in MEDLINE.
Informally, the disease was called gay-related immunodeficiency
syndrome, or the gay cancer. The disease was going strong,
but from 1979 to 1982, articles about it were indexed under immunologic
deficiency syndromes. Articles were reindexed under acquired
immunodeficiency syndrome in 1983; then, if the term AIDS
was entered, MEDLINE automatically mapped the acronym to the new,
correct search term. Without checking on the indexing history, searching
with the term AIDS might well make one conclude that the disease
did not exist until 1983, and missing all the pre-1983 literature might
look a bit foolish. How can you deal with this issue? Ask a librarian
to show you how to read scope notes in the database thesaurus
and how to read the chronology of term changes, additions, and deletions
for a database of interest.
Like the science they
report, reproducibility and consensus are criteria used to update databases.
Until something is reproducible and, therefore, credible, a new topic
is often subsumed under a broader heading. Most database producers will
add a new search term or concept, drug name, and so forth only after a
significant number of articles have been written on the topic. For a database
such as MEDLINE (PubMed), the National Library of Medicine also
uses the input of librarians, physicians, and researchers when deciding
to add, change, or delete search terms. This method for updating search
terms is probably true for Agricola (http://www.nal.usda.gov/ag98/ag98.html)
as well.
It is good practice
to read each databases published description to understand seven
critical factors: (1) the journals covered; (2) the span of coverage in
years; (3) whether a controlled vocabulary (specific index terms that
must be used), free text, or both are used; (4) whether indexing is done
by machines or people, and the first language of the indexers (if done
by people); (5) the indexing priority (e.g., are articles indexed from
the top-10 plant biology journals before the top-10 insect journals are
indexed, implying that citations for plant biology will be more current
than those for insect journals?); (6) the time lag between journal publication
date and the date that articles appear online; and (7) the mechanism or
procedure by which the database producer collects suggestions for new
terms, new capabilities, and new journals.
In biotech pharmaceutical
work, even the computers used in research have to be validated for accurate
performance. Furthermore, when biotech scientists include a literature
search in a U.S. Food and Drug Administration application, they must explain
why certain databases were chosen and describe them using factors such
as those listed above. Early on, students should be required to get in
the habit of seeking out the capabilities and the limitations so they
can quickly ascertain which databases match their search requirements
and what search techniques are specific to each database. Again, a librarian
can help set you on the fast track.
Next: A summary
of citations guidelines.
Tamara Turner
Librarian and editor, Seattle
Dina Mandoli
mandoli@u.washington.edu
|