Skip to content

Awesome

Software and Libraries

  • Grobid (GitHubDoc, Demo) is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such.
  • Biblio-glutton (GitHub) is a bibliographical reference matching service. From an input such as a raw bibliographical reference and/or a combination of key metadata, the service returns the disambiguated bibliographical object with in particular its DOI and a set of metadata aggregated from Crossref, PUBMED and other awesome bibliographical resources.
  • spaCy (GitHub, Doc) is an Industrial-Strength Natural Language Processing Python library.
  • Doccano (GitHub, Demo) is an open source text annotation tool for machine learning practitioners.

Data

  • Crossref interlinks millions of items from a variety of content types, including journals, books, conference proceedings, working papers, technical reports, and data sets. Linked content includes materials from Scientific, Technical and Medical (STM) and Social Sciences and Humanities (SSH) disciplines.
  • PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics.
  • Unpaywall is an open database of 25+ million free scholarly articles.
  • Lens.org is a free and open search platform for patents and scholarly articles.

Articles

  • Kevin A. Bryan, Yasin Ozcan, Bhaven Sampat, In-text patent citations: A user's guide, Research Policy, Volume 49, Issue 4, 2020, 103946, ISSN 0048-7333, https://doi.org/10.1016/j.respol.2020.103946
  • Osmat Jefferson et al. Mapping the global influence of published research on industry and innovation. Nat Biotechnol 36, 31–39 (2018). https://doi.org/10.1038/nbt.4049
  • Matt Marx and Aaron Fuegi , Reliance on Science: Worldwide Front-Page Patent Citations to Scientific Articles (November 15, 2019). Boston University Questrom School of Business Research Paper No. 3331686, http://dx.doi.org/10.2139/ssrn.3331686
  • Charles Sutton and Andrew McCallum (2012), An Introduction to Conditional Random Fields, Foundations and Trends® in Machine Learning: Vol. 4: No. 4, pp 267-373. http://dx.doi.org/10.1561/2200000013
  • Dominika Tkaczyk, Andrew Collins, Paraic Sheridan and Joeran Beel, Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers, 2018, arXiv:1802.01168 cs.DL