From Cyclopaedia to Encyclopédie: experiments in machine translation and sequence alignment

Figure 1. Title page from the 1745 prospectus of the first Encyclopédie project. This page image is taken from ARTFL’s 18th Volume of the Encyclopédie.

It is well known that the Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers began first as a modest translation project of Ephraim Chambers’ Cyclopaedia in 1745. Over the next few years, Diderot and D’Alembert would replace the original editors and the project would be duly transformed from a simple translation into an effort to compile and organise the sum total of the world’s knowledge. Over the course of their editorial work, Diderot, and most notably D’Alembert, were not shy in incorporating these translations of the Cyclopaedia as filler for the Encyclopédie. Indeed, ‘ils ont laissé une bonne partie de ces articles presque inchangés, ou avec des modifications insignifiantes’ (Paolo Quintili, ‘D’Alembert “traduit” Chambers. Les articles de mécanique de la Cyclopædia à l’Encyclopédie’, Recherches sur Diderot et sur l’Encyclopédie 21 (1996), p.75). The philosophes were nonetheless conscious of their debt to their English predecessor Chambers. His name appears some 1154 times in the text of the Encyclopédie and he is referenced as sole or contributing source to 1081 articles, where his name appears in italics at the end of a section or article. Given the scale of the two works under consideration, systematic evaluation of the extent of the philosophes’ use of Chambers has remained, even today, a daunting task. John Lough, in 1980, framed the problem nicely: ‘So far no one has had the patience to make a detailed study of the exact relationship between the text of Diderot’s Encyclopédie and the work of Ephraim Chambers. This would no doubt require several years of arduous toil devoted to comparing the two works article by article’(‘The Encyclopédie and Chambers’ Cyclopaedia’, SVEC 185 (1980), p.221).

Recent developments in machine translation and sequence alignment now offer new possibilities for the systematic comparison of digital texts across languages. The following post outlines some recent experimental work in leveraging these new techniques in an effort to reduce the ‘arduous toil’ of textual comparison, giving some preliminary examples of the kinds of results that can be achieved, and providing some cursory observations on the advantages and limitations of such systems for automatic text analysis.

Our two comparison datasets are the ARTFL Encyclopédie (v. 1117) and the recently digitised ARTFL edition of the 1741 Chambers’ Cyclopaedia (link). The 1741 edition was selected as it was one of the likely sources for the translation original project and we were able to work from high quality pages images provided by the University of Chicago Library (On the possible editions of the Cyclopaedia used by the encyclopédistes, see Irène Passeron, ‘Quelle(s) édition(s) de la Cyclopœdia les encyclopédistes ont-ils utilisée(s)?’, Recherches sur Diderot et sur l’Encyclopédie 40-41 (2006), p.287-92.) In a nutshell, our approach was to generate a machine translation of all of the Cyclopaedia articles into French and then use ARTFL’s Text-PAIR sequence alignement system to identify similar passages between this virtual French Cyclopaedia and the Encyclopédie, with the translation providing links back to the original English edition of the Chambers as well as links to the relevant passages in the Encyclopédie.

For the English to French machine translation of Chambers, we examined two of the most widely used resources in this domain, Google Translate and DeepL. Both systems provide useful Application Programming Interfaces [APIs] as part of their respective subscription services, and both provide translations based on cutting-edge neural network language models. We compared results from various samples and found, in general, that both systems worked reasonably well, given the complications of eighteenth-century vocabularies (in both English and French) and many uncommon and archaic terms (this may be the subject of a future post). While DeepL provided somewhat more satisfying translations from a reader’s perspective, we ultimately opted to use Google Translate for the ease of its API and its ability to parse the TEI encoding of our documents with little difficulty. The latter is of critical importance, since we wanted to keep the overall document structure of our dictionaries to allow for easy navigation between the versions.

Operationally, we segmented the text of the Cyclopaedia into short blocks, split at paragraph breaks, and sent them for automatic translation via the Google API, with a short delay between blocks. This worked relatively well, though the system would occasionally throw timeout or other errors, which required a query resend. You can inspect the translation results here – though this virtual French edition of the Chambers is not really meant for public consumption. Each article has a link at the bottom to the corresponding English version for the sake of comparison. It is important to note that the objective here is NOT to produce a good translation of the text or even one that might serve as the basis for a human edition. Rather, this machine-generated edition exists as a ‘pivot-text’ between the English Chambers and the French Encyclopédie, allowing for an automatic comparison of the two (or three) versions using a highly fault-tolerant sequence aligner designed to pick out commonalities in very noisy document spaces. (See Clovis Gladstone, Russ Horton, and Mark Olsen, ‘TextPAIR (Pairwise Alignment for Intertextual Relations)’, ARTFL Project, University of Chicago, 2008-2021, and, more specifically, Mark Olsen, Russell Horton and Glenn Roe, ‘Something borrowed: sequence alignment and the identification of similar passages in large text collections’, Digital Studies / Le Champ numérique 2.1 (2011).)

The next step was to establish workable parameters for the Text-PAIR alignment system. The challenge here was to find commonalities between the French translations created by eighteenth-century authors and translators and machine translations produced by a modern automatic translation system. Additionally, the editors and authors of the Encyclopédie were not necessary constrained to produce an exact translation of the text in question, but could and did, make significant modifications to the original in terms of length, style, and content. To address this challenge we ran a series of tests with different matching parameters such as n-gram construction (e.g., number of words that constitue an n-gram), minimum match lengths, maximum gaps between matches, and decreasing match requirements as a match length increased (what we call a ‘flex gap’) among others on a representative selection of 100 articles from the Encyclopédie where Chambers was identified as the possible source. It is important to note that even with the best parameters, which we adjusted to get favorable recall and precision results, we were only able to identify 81 of the 100 articles. (See comparison table. The primary parameters chosen were bigrams, stemmer=true, word len=3, maxgap=12, flexmatch=true, minmatchingngrams=5. Consult the TextPair documentation and configuration file for a description of these values.) Some articles, even where clearly affiliated, were missed by the aligner, due to the size of the articles (some are very small) and fundamental differences in the translation of the English. For example, the article ‘Compulseur’ is attributed by Mallet to Chambers, but the machine translation of ‘Compulsor’ is a rather more literal and direct translation of the English article than what is offered by Mallet. Further relaxing matching parameters could potentially find this example, but would increase the number of false positives, in effect drowning out the signal with increased noise.

All things considered, we were quite happy with the aligner’s performance given the complexity of the comparison task and the multiple potential variations between historical text and modern machine translations. To give an example of how fine-grained and at the same time highly flexible our matching parameters needed to be, see the below article ‘Gynaecocracy’, which is a fairly direct translation on a rather specialised subject, but that nonetheless matched on only 8 content words (fig. 2).

Figure 2. Comparisons of the article ‘Gynaecocracy’.

Other straightforward articles were however missed due to differences in the translation and sparse matching n-grams, see for example the small article on ‘Occult’ lines in geometry below, where the 6 matching words weren’t enough to constitute a match for the aligner (fig. 3).

Figure 3. Comparisons of the geometry article ‘Occult’.

Obviously this is a rather inexact science, reliant on an outside process of automatic translation and the ability to match a virtual text that in reality never existed. Nonetheless the 81% recall rate we attained on our sample corpus seemed more than sufficient for this experiment and allowed us to move forward towards a more general evaluation of the entirety of identified matches.

Once settled on the optimal parameters, we then Text-PAIR to generate both an alignment database, for interactive examination, and a set of static files. Both of these results formats are used for this project. The alignment database contains some 7304 aligned passage pairs. The system allows queries on metadata, such as author and article title as well as words or phrases found in the aligned passages. The system also uses faceted browsing to allow the user to summarize results by the various metadata (for more on this, see Note below). Each aligned passage is presented as a facing page representation and the user can toggle a display of all of the variations between the two aligned passages. As seen below, the variations between the texts can be extensive (fig. 4).

Figure 4. Text-PAIR interface showing differences in the article ‘Air’.

Text-PAIR also contextualises results back to the original document(s). For example, the following is the article ‘Almanach’ by D’Alembert, showing the aligned passage from Chambers in blue (fig. 5).

Figure 5. Article ‘Almanach’ with shared Chambers passages in blue.

In this instance, D’Alembert reused almost all of Chambers’ original article ‘Almanac’, with some minor variations, but does not to appear to have indicated the source of the first part of his article (page image).

The alignment database is a useful first pass to examine the results of the alignment process, but it is limited in at least two ways. It identifies each aligned passage, but does not merge multiple passages identified in in article pairs. Thus we find 5 shared passages between the articles ‘Constellation’. The interface also does not attempt to evaluate the alignments or identify passages that occur between different articles. For example, D’Alembert’s article ‘ATMOSPHERE’ indeed has a passage from Chambers’ article ‘Atmosphere’, but also many longer passages from the article ‘Generation’.

To accumulate results and to refine evaluation, we subsequently processed the raw Text-PAIR alignment data as found in the static output files. We developed an evaluation algorithm for each alignment, with parameters based on the length of the matching passages and the degree to which the headwords were close matches. This simple evaluation model eliminated a significant number of false positives, which we found were typically short text matches between articles with different headwords. The output of this algorithm resulted in two tables, one for matches that were likely to be valid and one that was less likely to be valid, based on our simple heuristics – see a selection of the ‘YES’ table below (fig. 6). We are, of course, making this distinction based on the comparison of the machine translated Chambers headwords and the headwords found in the Encyclopédie, so we expected that some valid matches would be identified as invalid.

Figure 6. Table of possible article borrowings.

The next phase of the project included the necessary step of human evaluation of the identified matches. While we were able to reduce the work involved significantly by generating a list of reasonably solid matches to be inspected, there is still no way to eliminate fully the ‘arduous toil’ of comparison referenced by Lough. More than 5000 potential matches were scrutinised, looking in essence for ‘false negatives’, i.e., matches that our evaluation algorithm classed as negative (based primarily on differences in headword translations) but that were in reality valid. The results of this work was then merged into in a single table of what we consider to be valid matches, a list that includes some 3700 Encyclopédie articles with at least one matching passage from the Cyclopaedia. These results will form the basis of a longer article that is currently in preparation.

Conclusions

In all, we found some 3778 articles in the Encyclopédie that upon evaluation seem highly similar in both content and structure to articles in the 1741 edition of Chambers’ Cyclopaedia. Whether or not these articles constitute real acts of historical translation is the subject for another, or several other, articles. There are simply too many outside factors at play, even in this rather straightforward comparison, to make blanket conclusions about the editorial practices of the encyclopédistes based on this limited experiment. What we can say, however, is that of the 1081 articles that include a ‘Chambers’ reference in the Encyclopédie, we only found 689 with at least one matching passage. Obviously this recall rate of 63.7% is well below the 81% we attained on our sample corpus, probably due to overfitting the matching algorithm to the sample, which warrants further investigation. But beyond testing this ground truth, we are also left with the rather astounding fact of 3089 articles with no reference to Chambers whatsoever, all of which seem, at first blush, to be at least somewhat related to their English predecessors.

The overall evaluation of these results remains ongoing, and the ‘arduous toil’ of traditional textual comparison continues apace, albeit guided somewhat by the machine’s heavy hand. Indeed, the use of machine translation as a bridge between documents to find similar passages, be they reuses, plagiarisms, etc., is, as we have attempted to show here, a workable approach for future research, although not without certain limitations. The Chambers–Encyclopédie task outlined above is fairly well constrained and historically bounded. More general applications of these same methods may well yield less useful results. These reservations notwithstanding, the fact that we were able to unearth many thousands of valid potential intertextual relationships between documents in different languages is a feat that even a few years ago might not have been possible. As large-scale language models become ever more sophisticated and historically aware, the dream of intertextual bridges between multilingual corpora may yet become a reality. (For more on ‘intertextual bridges’ in French, see our current NEH project.)

Note

The question of the Dictionnaire de Trévoux is one such factor, as it is known that both Chambers and the encyclopédistes used it as a source for their own articles – so matches we find between the Chambers and Encyclopédie may indeed represent shared borrowings from the Trévoux and not a translation at all. Or, more interestingly, perhaps Chambers translated a Trévoux article from French to English, which a dutiful encyclopédiste then translated back to French for the Encyclopédie – in this case, which article is the ‘source’ and which the ‘translation’? For more on these particular aspects of dictionary-making, see our previous article ‘Plundering philosophers: identifying sources of the Encyclopédie’, Journal of the Association for History and Computing 13.1 (Spring 2010) and Marie Leca-Tsiomis’ response, ‘The use and abuse of the digital humanities in the history of ideas: how to study the Encyclopédie’, History of European ideas 39.4 (2013), p.467-76.

– Glenn Roe and Mark Olsen

Exploring Voltaire’s letters: between close and distant readings

La lettre au fil du temps: philosophe

‘La lettre au fil du temps: philosophe.’

A stamp produced by the French post office in 1998 celebrates the art of letter-writing by depicting Voltaire writing letters with both hands. It’s true that Voltaire wrote a lot of letters – over 15,000 are known, and more turn up all the time – but even so it’s not altogether clear that an ambidextrous letter-writer is someone we entirely want to trust. Voltaire’s correspondence is full of difficulties and traps, and faced by such a huge corpus, it is hard to know where to start. Without question, the Besterman ‘definitive’ edition (1968-77), digitised in Electronic Enlightenment, has had a major impact on Enlightenment scholarship: historians and literary critics make frequent use of these letters, but usually in an instrumental way, adducing a single passage in a letter as evidence in support of a date or an interpretation.

Nicholas Cronk and Glenn Roe, Voltaire’s correspondence: digital readings (CUP, 2020)

Nicholas Cronk and Glenn Roe, Voltaire’s correspondence: digital readings (CUP, 2020).

Voltaire’s letters can be notoriously ‘unreliable’, however, and they really need to be read and interpreted – like all his texts – as literary performances. Few critics have attempted to examine the corpus of the correspondence in its entirety and to understand it as a literary whole. In our new book, Voltaire’s correspondence: digital readings, we have experimented with a range of digital humanities methods, to explore to what extent they might help us identify new interpretative approaches to this extraordinary correspondence. The size of the corpus seems intimidating to the critic, but it is precisely this that makes these texts a perfect test-case for digital experimentation: we can ask questions that we would simply not have been able to ask before.

For example, we looked at the way Voltaire signs off his letters – and were surprised to find that only 13% of the letters are actually signed ‘Voltaire’; while over a third of the letters are signed with a single letter, ‘V’. Then Voltaire is hugely inventive in the way he plays with the rules of epistolary rhetoric, posing as a marmot to the duc de Choiseul. And if you want to know why in a letter (D18683) to D’Alembert he signs off ‘Miaou’, the answer is to be found in a fable by La Fontaine…

We studied Voltaire as a neologist. Critics have usually described Voltaire as an arch-classicist adhering rigorously to the norms of seventeenth-century French classicism. True, yet at the same time he is hugely energetic in coining new words, an aspect of his literary style that has been insufficiently studied. Here, corpus analysis tools, coupled with available lexicographical digital resources, allow us to consider Voltaire’s aesthetic of lexical innovation. In so doing, we can test the hypothesis that Voltaire uses the correspondence as a laboratory in which he can experiment with new formulations, ideas, and words – some of which then pass into his other works. We identified 30 words first coined by Voltaire in his letters, and another 36 words first used in his other works, many of which are then reused in the correspondence. Emmanuel Macron has encouraged the description of himself as a ‘président jupitérien’, so it’s good to discover that ‘jupitérien’ is one of the words first coined by Voltaire.

Voltaire letter

A letter in Voltaire’s hand, sent from the city of Colmar to François Louis Defresnay (D5612, dated 1753/1754).

A reader of Voltaire’s letters cannot fail to be struck by the frequency of his literary quotations. We explore this phenomenon through the use of sequence alignment algorithms – similar to those used in bioinformatics to sequence genetic data – to identify similar or shared passages. Using the ARTFL-Frantext database of French literature as a comparison dataset, we attempt a detailed quantification and description of French literary quotations contained in Voltaire’s correspondence. These citations, taken together, give us a more comprehensive understanding of Voltaire’s literary culture, and provide invaluable insights into his rhetoric of intertextuality. No surprise that he quotes most often the authors of ‘le siècle de Louis XIV’, though it was a surprise to find that Les Plaideurs is the Racine play most frequently cited. And who expected to find two quotations from poems by Fontenelle (neither of them identified in the Besterman edition)?! Quotations in Latin also abound in Voltaire’s letters, many of these drawn, predictably enough, from the famous poets he would have memorised at school, Horace, Virgil, and Ovid – but we also identified quotations, hitherto unidentified, from lesser poets, such as a passage from Manilius’ Astronomica. By examining as a group the correspondents who receive Latin quotations, and assigning to them social and intellectual categories established by colleagues working at Stanford, we were able to establish clear networks of Latin usage throughout the correspondence, and confirm a hunch about the gendered aspect of quotation in Latin: Voltaire uses Latin only to his élite correspondents, and even then, with notably rare exceptions such as Emilie Du Châtelet, only to men.

The woman on the left, a trainee pilot in the Brazilian air force, is an unwitting beneficiary of Voltaire’s bravura use of Latin quotation. The motto of the Air Force Academy is a stirring (if slightly macho) Latin quotation: ‘Macte animo, generose puer, sic itur ad astra’ (Congratulations, noble boy, this is the way to the stars). The quotation is one that Voltaire uses repeatedly in some dozen letters, and it is found later, for example in Chateaubriand’s Mémoires d’outre-tombe. On closer investigation it turns out that this piece of Latin is an amalgam of quotations from Virgil and Statius – in effect, a piece of pure Voltairean invention.

In the end, Voltaire’s correspondence is undoubtedly one of his greatest literary masterpieces – but it is arguably one that only becomes fully legible through the use of digital resources and methods. Our intention with this book was to affirm the simple postulate that digital collections – whether comprised of letters, literary works, or historical documents – can, and should, enable multiple reading strategies and interpretative points of entry; both close and distant readings. As such, digital resources should continue to offer inroads to traditional critical practices while at the same time opening up new, unexplored avenues that take full advantage of the affordances of the digital. Not only can digital humanities methods help us ask traditional literary-critical questions in new ways – benefitting from economies of both scale and speed – but, as we show in the book, they can also generate new research questions from historical content; providing interpretive frameworks that would have been impossible in a pre-digital world.

The size and complexity of Voltaire’s correspondence make it an almost ideal corpus for testing the two dominant modes of (digital) literary analysis: on the one hand, ‘distant’ approaches to the corpus as a whole and its relationship to a larger literary culture; on the other, fine-grained analyses of individual letters and passages that serve to contextualise the particular in terms of the general, and vice versa. The core question at the heart of the book is thus one that remains largely untreated in the wider world: how can we use digital ‘reading’ methods – both close and distant – to explore and better understand a literary object as complex and multifaceted as Voltaire’s correspondence?

– Nicholas Cronk & Glenn Roe, Co-directors of the Voltaire Lab at the VF

Voltaire’s correspondence: digital readings will be published in print and online at the end of October. The online version is available free of charge for two weeks to personal and institutional subscribers.

Digitizing the Enlightenment

As country after country has gone into COVID-19 lockdown, we have all had to learn to communicate, network, teach, study and relate online in ways unimaginable a few short years – or even months – ago. This phenomenon is just the latest stage in the information-technology revolution and part and parcel of the ongoing development of an increasingly digital society. This revolution has touched almost every aspect of our lives, from how we work, study, shop, relax and even make and maintain personal relationships. But it is also transforming scholarship and the way we conduct and communicate academic research. Thus, it is perhaps apt, and with consummate good timing, that Oxford University Studies in the Enlightenment has chosen to subject tag our new volume as ‘History of Scholarship (Principally of Social Sciences and Humanities)’. Yet this is certainly not how we and our collaborators envisaged our project at the outset, nor can any single tag capture the content of our volume and its collaborative agenda in its entirety.

The Digitizing Enlightenment workshop logo

The Digitizing Enlightenment workshop logo, designed by Evan Casey for the Voltaire Foundation, featured on the cover of Digitizing Enlightenment.

Ironically, as we write, Digitizing Enlightenment is also a living movement – or at least a loose network of scholars who meet annually in pursuit of a common agenda. That agenda was born in a series of conversations that took place from 2010, culminating in Dan Edelstein’s post-panel suggestion at the American Historical Association conference at Montreal in April 2014 that we should hold periodic meetings between like-minded digital projects relating to the Enlightenment. The aim of these meetings would be to establish common conventions and digital standards, with a view to linking our resources and realising the enormous and still largely untapped potential of Linked Open Data. Those present for Dan’s suggestion – Simon Burrows, Jeff Ravel, Sean Takats and Dan himself – have all provided chapters for our book, but much of the energy behind Digitizing Enlightenment since has come from Glenn Roe, who Simon had first encountered a month earlier in Australia, where they had both recently taken up academic positions.

It was this fortuitous coincidence, underpinned by the fertile combination of Simon’s professorial establishment funds and Glenn’s energy, together with their mutual contact books, that led to Western Sydney University hosting the first Digitizing Enlightenment symposium in July 2016. Among the projects discussed there, and in our book, were large-scale treatments of Enlightenment correspondences, theatre attendance records, and textual corpora including the mid-eighteenth century Encyclopédie; bibliometric projects were presented on the production and dissemination of literature; together with presentations on mapping and data visualization growing out of these projects. The symposium was so well received that it has been an annual event ever since. It was held at Radboud University in Nijmegen (2017), Oxford (2018), Edinburgh (2019). In 2020, but for COVID-19, it would have been held in Montpellier.

It was not entirely by chance that such a project coalesced around the guiding notion of the ‘Enlightenment’. For the long eighteenth century has been blessed by a number of high-profile and long-established digital projects. These include ground-breaking commercial datasets such as Gale-Cengage’s Eighteenth-Century Collections Online (ECCO), which features in several of our chapters, semi-commercial projects such as the Electronic Enlightenment and large academic consortiums such as the Franco-American ARTFL project. This made the Enlightenment a natural laboratory for exploring the possibilities and achievements of the Digital Humanities for transforming scholarship on a single historical era. Further, as our book emphases, our discussions built on a long tradition of digital innovation in eighteenth-century studies that can be traced back at least as far as the twin Livre et société dans la France du XVIIIe siècle volumes produced by a team led by François Furet in 1965 and 1970. It might further be added that our over-arching subject material lends itself to digital-historical analysis; the Enlightenment might after all be viewed as the long-run culmination of the intellectual turmoil and – as several contributors point out – information overload unleashed by a previous technological and communications revolution.

Digitizing Enlightenment is the July volume in the Oxford University Studies in the Enlightenment series

Digitizing Enlightenment is the July volume in the Oxford University Studies in the Enlightenment series.

With this in mind, then, we offer up Digitizing Enlightenment: Digital Humanities and the Transformation of Eighteenth-Century Studies as rather more than a contribution to the history of scholarship. Certainly, we have offered a sample of Digital Humanities c. 2016-2020, as it relates to the technologies available and their application to Enlightenment studies broadly construed. In addition, the first half of the book offers detailed accounts of the origins and development of key Enlightenment digital projects up until that point, accompanied by valuable and sometimes disarming insights on the dangers and delights of digital research from foremost practitioners in the field. These chapters, as well as some later contributions, are helping to reshape some dominant meta-narratives of the Enlightenment, not least by hinting simultaneously at the enduring aristocratic leadership of the French Enlightenment and the extent to which Enlightenment literary production and consumption was infused with religious content. However, our contributors also showcase other ways that Digital Humanities scholarship is in the process of changing the field through the transparency, methodological rigour, and collaborative imperatives that are necessary concomitants of this new kind of research. Finally, the book offers a collaborative roadmap for future digital research – at a moment where, as our final contributor, Sean Takats points out, the Enlightenment is fast losing its privileged position as the most richly digitized century of the modern era. As a corollary, we hope that our volume may be as useful to scholars of other periods as for Enlightenment scholars themselves.

– Simon Burrows (Western Sydney University) and Glenn Roe (Sorbonne University)

Simon Burrows and Glenn Roe are the editors of the July volume in the Oxford University Studies in the Enlightenment series, Digitizing Enlightenment: Digital Humanities and the Transformation of Eighteenth-Century Studies, which is the first book length survey of the impact of digital humanities on our understanding of a key historical period and paradigm.

This post is reblogged from Liverpool University Press.

Entretien avec Nicholas Cronk et Glenn Roe

For those who missed it first time round, here is another chance to read this interview with Glenn Roe and Nicholas Cronk, first published last January.

Glenn Roe et Nicholas Cronk.

Où en est la publication des Œuvres complètes de Voltaire par la Voltaire Foundation ?

Nicholas Cronk

La publication des Œuvres complètes de Voltaire a été initiée dans les années 1960 par Theodore Besterman, qui venait d’achever l’édition d’une gigantesque correspondance de plus de vingt mille lettres. L’édition qui faisait autorité, en quelque sorte, était encore celle de Beaumarchais et de Condorcet, imprimée à Kehl (1784-1785), car les grandes éditions qui lui ont succédé au XIXe siècle, comme celle de Louis Moland (1877-1885) reprennent son organisation. Seulement, l’édition de Kehl est un monument à la mémoire de Voltaire et pas véritablement une édition critique. L’organisation chronologique adoptée par la Voltaire Foundation, sur la proposition de William H. Barber, a permis d’éviter, par exemple, certains écueils de la classification générique, qui a du sens dans le cas des ouvrages d’histoire, des tragédies et de La Henriade, mais qui condamne les petits récits en prose, que Voltaire appelait « fusées volantes », à figurer dans des volumes de mélanges. L’édition de la Voltaire Foundation redonne leur place à ces textes, qui sont tout sauf mineurs. Elle sera achevée à l’automne 2020. Nous travaillons actuellement, par exemple, sur l’édition du Siècle de Louis XV, qui n’a jamais été éditée scientifiquement, sur les Annales de l’Empire et sur les Lettres philosophiques, qui sont plus connues.

Quel est le lien entre les Œuvres complètes et le projet Digital Voltaire ?

Nicholas Cronk

Publier les œuvres complètes de Voltaire est un travail infini et une édition numérique offre tout simplement l’avantage de pouvoir être régulièrement mise à jour, sans qu’il y ait besoin d’engager de moyens considérables. Le numérique permet également d’imaginer une édition critique d’un nouveau genre, moderne, proposant une articulation thématique, générique et chronologique inédite, enrichie d’hyperliens, de textes annexes, d’images, de musique (car les poèmes de Voltaire étaient parfois mis en musique), etc. Une telle édition doit faciliter le travail des chercheurs : Voltaire, par exemple, pratiquait volontiers l’auto-plagiat, c’est un phénomène qui n’a pas été beaucoup étudié et que les éditeurs de Kehl ont occulté, en supprimant des répétitions qu’ils trouvaient inconvenantes. Or, la redite, chez Voltaire, est une véritable esthétique, et à la fin de sa vie, il reprenait des textes de jeunesse, faisait parfois semblant d’ignorer qu’il en était lui-même l’auteur, les corrigeait, etc. Les techniques d’alignement de séquences permettent de redonner vie facilement à cet aspect de l’écriture. Le numérique doit également nous permettre de repenser des notions clefs de la pensée de Voltaire comme l’athéisme ou la tolérance, qui ont pu évoluer dans le temps, de comprendre son positionnement politique à telle ou telle période, ou les raisons de son intérêt pour la jurisprudence à la fin de sa vie. On doit pouvoir sortir de l’opposition traditionnelle un peu figée entre Voltaire et Rousseau et de la lecture monolithique proposée, par exemple, par le Dictionnaire philosophique en huit volumes de l’édition de Kehl, qui se compose de textes écrits sur quarante ou cinquante ans que Voltaire n’avait jamais pensé à regrouper.

Glenn Roe

Le label Digital Voltaire regroupe un ensemble de projets, qui ont vocation à enrichir, à terme, l’édition numérique des œuvres complètes de Voltaire. Le programme de recherche qui sera fixé courant 2019 prendra symboliquement le relais de l’édition papier. Les projets portent sur l’intertextualité, sur les autorités, sur les phénomènes de reprise, sur les principales thématiques de la pensée de Voltaire, que nous étudions en recourant à des techniques de topic modeling et de mapping. La vectorisation des mots doit nous permettre de mieux comprendre l’évolution de la pensée philosophique de Voltaire. Nous devrions parvenir à mettre au point une sorte d’ontologie ou de cartographie intellectuelle de Voltaire, qui pourra être comparée avec celle de Rousseau ou d’autres auteurs du XVIIIe siècle édités par la Voltaire Foundation.

Quelles sont les priorités de la Voltaire Foundation dans le domaine des humanités numériques ?

Nicholas Cronk

Il est certain qu’un projet numérique qui réunirait les œuvres et les correspondances de plusieurs auteurs du XVIIIe siècle, et qui ferait profiter aux chercheurs des possibilités nouvelles offertes par les outils développés au sein des humanités numériques, est loin d’être irréalisable et a de quoi séduire. Une expérience de ce genre a été réalisée sur les correspondances d’auteurs, dans les années 2000, au sein du projet Electronic Enlightenment, qui regroupe environ soixante-dix-mille lettres dans plusieurs langues. Mais je dirais que l’enjeu le plus immédiat, pour nous et pour Digital Voltaire, c’est aujourd’hui de parvenir à développer ce laboratoire de recherche en humanités numériques qui favorisera les recherches sur l’œuvre de Voltaire et sur sa réception, tout en restant l’édition critique de référence. Ce projet est un modèle de ce que nous pourrions faire à la Voltaire Foundation dans les années à venir, en collaboration avec d’autres partenaires comme la Sorbonne.

– Propos recueillis par Romain Jalabert

The above post is reblogged from Observatoire de la vie littéraire, where it first appeared on 26 January 2019.

Voltaire Lab: new digital research tools and resources

As part of our efforts to establish the Voltaire Lab as a virtual research centre, we are pleased to announce a major update of the TOUT Voltaire database and search interface, expanding links between the ARTFL Encyclopédie Project and several new research databases made available for the first time. Working in close collaboration with the ARTFL Project at the University of Chicago – one of the oldest and better known North American centres for digital humanities research – we have rebuilt the TOUT Voltaire database under PhiloLogic4, ARTFL’s next-generation search and corpus analysis engine.

Image1

New Search interface for TOUT Voltaire

PhiloLogic4 is a powerful research tool, allowing users to browse Voltaire’s works dynamically by date or title, along with further faceted browsing using the ‘title’, ‘year’ and ‘genre’, combined with word and phrase searching. Word searches are greatly improved for flexibility and ease of display and now include four primary result reports:

  • Concordance, or search terms in their context
  • KWIC, or line-by-line occurrences of the search term
  • Collocation, or terms that co-occur most with the search term
  • Time Series, which displays search term frequency over time

The new search interface will allow users to formulate complex queries with relatively little effort, following lines of enquiry in a dynamic fashion that moves from ‘distant reading’ scales of exploration to more fine-grained close textual analysis.

Image2

TOUT Voltaire search results

Also in collaboration with ARTFL, we have just released the Autumn Edition 2017 of the ARTFL Encyclopédie, a flagship digital humanities project that for the past almost twenty years has made available online the full text of Diderot and d’Alembert’s great philosophical dictionary. This new release offers many new features, functionalities and improvements. The powerful new faceted search and browse capabilities offered by PhiloLogic4 allow users better to leverage the organisational structure of the Encyclopédie – classes of knowledge, authors, headwords, volumes, and the like. Further it gives them the possibility of exploring the interesting alternatives offered by algorithmically or machine-generated classes. The collocation search generates word-clouds or word lists that are clickable to obtain concordances for any of the words immediately. Further improvements include new author attributions, various text corrections, and better cross-referencing functionality.

Image3

New ARTFL Encyclopédie interface

This release also contains a beautiful new set of high-resolution plate images. Clickable thumbnail versions lead to larger images that can be viewed in much greater detail than was previously possible.

Image4

New high resolution plate images, ‘Imprimerie en taille douce’

Image5

Close up of plate image

Thanks to the Voltaire Foundation, full biographies of the encyclopédistes are directly accessible from within the ARTFL Encyclopédie simply by clicking on the name of the author of any given article. This information is drawn directly from Frank and Serena Kafker’s The Encyclopedists as Individuals: A Biographical Dictionary of the Authors of the Encyclopédie (SVEC 257, 1988) – still the standard reference for biographical information on the Encyclopédie’s 139 contributors. Our hope is that this first experiment will demonstrate the value of linking digital resources openly in ways that can add value to existing projects and, at the same time, increase the visibility of the excellent works contained in the Oxford University Studies in the Enlightenment back catalogue.

Finally, we have begun the work of establishing new research collections that will form the basis of the Voltaire Lab’s textual corpus. For example, working with files provided by Electronic Enlightenment, we have combined all of Voltaire’s correspondence with TOUT Voltaire. This new resource, which we are for the moment calling ‘TV2’, contains over 22,000 individual documents and more than 13 million words, making it one of the largest single-author databases available for research. Due to copyright restrictions in the correspondence files we cannot make the full dataset publicly available, however we are keen to allow researchers access to this important resource on a case-by-case basis. Students and scholars who wish to access the PhiloLogic4 build of TV2 should contact me here.

Glenn Roe

Tout Voltaire

09The Voltaire Foundation, in collaboration with the ARTFL Project, is pleased to announce the public release of the TOUT VOLTAIRE online database. This database brings you in fully searchable form all of Voltaire’s works apart from his correspondence (which can be searched separately, in Electronic Enlightenment).

Currently publishing the Complete works of Voltaire in print, the Voltaire Foundation plans to unveil an online version of this definitive critical edition sometime after 2018. In the meantime, this plain text version of Voltaire’s writings (without critical apparatus or notes) is the most reliable version available anywhere on the web.

The various editions used to establish this database are clearly marked: from the Voltaire Foundation’s own Complete works of Voltaire to nineteenth-century editions by Beuchot and Moland, among others.  When possible we have included Voltaire’s notes, as well as some textual variants depending on the edition. Pagination, however, is often not representative of the print editions, so if you wish to cite Voltaire for scholarly purposes, you should always consult the list of the best critical editions currently available.

The TOUT VOLTAIRE database is built using ARTFL’s full-text search and retrieval engine PhiloLogic, one of the oldest and most successful text analysis systems in the digital humanities. With a wide variety of search and reporting functions, users can look for words, groups of words, or phrases over Voltaire’s entire corpus, or in individual works (and even parts of works). Results can be displayed in context, as frequency reports (by title, by decade, etc.), or as a collocation table and word cloud.

Example searches could include:

For more search tips, please visit the PhiloLogic user manual.

This research tool is made available free of charge by the Voltaire Foundation (University of Oxford) and the ARTFL Project (University of Chicago). If you wish to make a contribution to our work, please contact the Voltaire Foundation.

Glenn Roe