Full information with methods and example data analyses are available for this database as a Nature Methods paper [1]. The methodology is based on a combination of two techniques. The first is topic modeling, using Latent Dirichlet Allocation [2-5], a Bayesian statistical algorithm that automatically discovers meaningful categories from unstructured text, independent of keywords or preconceived categorical designations. The second is a graph-based layout algorithm, [6-8] which produces a two-dimensional visualized output in which documents are clustered based on their overall topic- and word-based similarity to one another. These two complementary methods are combined in an interactive web-based format [9] that provides a context in which grants are categorized and clustered based on the language used by researchers.
References
- Talley, E. M. et al (2011). Database of NIH grants using machine-learned categories and graphical clustering. Nature Methods, 8, 443-444.
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
- Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl , 5228-5235.
- D. Blei and J. Lafferty, "Topic Models," in "Text Mining: Theory and Applications", edited by A. Srivastava and M Sahami (Taylor and Francis, 2009).
- Griffiths, T.L., Steyvers, M., & Tenenbaum, J.B.T. (2007). Topics in Semantic Representation. Psychological Review, 114(2), 211-244.
- K. W. Boyack, B. N. Wylie, and G. S. Davidson, "Domain Visualization Using VxInsight for Science and Technology Management", Journal of the American Society for Information Science and Technology, pp.764-774 (2002).
- G. S. Davidson, B. N. Wylie, and K. W. Boyack, "Cluster Stability and the Use of Noise in Interpretation of Clustering", Proc.IEEE Information Visualization, pp.23-30 (2001).
- Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL: Distributed Recursive (Graph) Layout. SAND Reports, 2008. 2936: p. 1-10. [available on request here]
- B. W. Herr, et al., "The NIH Visual Browser: An Interactive Visualization of Biomedical Research", IEEE International Conference Information Visualisation, pp.505-509 (2009).