Paper to HTML Tutorial

Last updated: August 10, 2022

Paper to HTML is a web app that converts scientific papers into HTML. It was designed primarily to help improve the accessibility of scientific papers to blind and low vision users and users of assistive reading technology like screen readers or text-to-speech, though it may also benefit users of mobile or small screen devices. This tutorial describes the main features of this site.

Upload a paper to Paper to HTML

Select the "Choose File" button, and select a paper to process. This file is most often a PDF, but can also be the LaTeX source or XML document representing the paper. Once you have selected a paper, you can click the "Upload" button to begin processing. This usually takes between 30 seconds to 2 minutes for each new paper uploaded into the system. During this time, the system is running a series of machine learning models to extract content from the document.
When processing finishes, the resulting HTML is shown

This page can be bookmarked to return to this paper. The bookmark should default to saving based on the title of the uploaded paper, if it is properly extracted.
If major issues occur when processing the paper, these errors will be listed at the top of the page; these errors may indicate a low quality extraction
Navigate between sections of the paper using the extracted section headers

These are surrounded by the <h2> HTML tag, for example: the Data & Methods section
Navigate to extracted tables and figures

These are surrounded by the <figure> HTML tag, for example: Figure 1
Section headings and extracted tables and figures are listed under the Table of Contents, which is located near the top of the page following the title and authors
Individual sections within the paper can be shared

For example, this link goes to the Data & Methods section of the research paper about this app.
Some tables are converted into HTML for improved ease of reading

For example: Table 2 from an example paper
Bibliography entries are presented in the last section of the document, under the heading "References," as below:
- D. Ahmetovic, T. Armano, C. Bernareggi, M. Berra, A. Capietto, S. Coriasco, N. Murru, Alice Ruighi, and E. Taranto. 2018. Axessibility: a LaTeX Package for Mathematical Formulae Accessibility in PDF Documents. Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (2018).
- Saleema Amershi, Daniel S. Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi T. Iqbal, P. Bennett, Kori Inkpen Quinn, J. Teevan, Ruth Kikin-Gil, and E. Horvitz. 2019. Guidelines for Human-AI Interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019).
Inline citations in the document are linked to their corresponding entries in the bibliography; return links in the bibliography after each reference entry can take the reader back to their previous reading location
Low quality extractions are labeled

You may encounter the following text: "Not extracted; please refer to original document." These cases represent low quality extractions where the user may be better off going to the source document.
System shortcuts like Ctrl-F (find) and Ctrl-C (copy) work well with text in the HTML render
Paper to HTML can be used in conjunction with various web translation tools

For example, Google Translate (available for Chrome and Firefox) can quickly translate the whole document into 133 languages. Of course, these are external tools so we have not performed any validation nor can we guarantee quality.
You can also save the HTML document for offline reading

Select 'Edit, Save as...' in your browser. However, please note that the within-document navigation links will not function in the saved page.
Try it out!

Still have unanswered questions? Check out our about page or email accessibility@semanticscholar.org.