PIVAJ stands for Plateforme d’Indexation et de Visualisation d’Archives de Journaux, which means Newspaper Archives Indexing and Visualizing Platform.
There are two parts in this platform, as described in our Archiving paper.
The offline part analyses newspaper digitizations to progressively extract the logical layout of the paper, from the issue to the sections down to the articles, individual text and illustrations. An ALTO/METS description of the issue if then produced. The scientific work implied in this part is described here.
The online part is a Web site that gives access to the newspaper archive, the digitized images but also the extracted logical layout, as well as the automatically transcripted text (“OCRed”). We provide tools to crowdsource the correction of the transcription. Our first prototype is shown in this video :
You can play with it here.
Our latest prototype, which contains all the feature described in the Archiving paper can be found there.