Beaumarchais letters: editorial history and current research

The recent addition to Electronic Enlightenment (EE) of 417 letters from the Beaumarchais correspondence is a significant event in 18th-century studies. They appeared over thirty years ago in the two-volume edition prepared by Gunnar and Mavis von Proschwitz, Beaumarchais et le Courier de l’Europe, for the Studies on Voltaire and the eighteenth century, volumes 273–274 (1990). Added to the 257 Beaumarchais letters already included in EE, these 674 letters constitute over a sixth of known Beaumarchais letters and approximately one third of Beaumarchais letters published to date. Their online publication, along with other current research projects on the correspondence, offers scholars new reasons to consider this oft-cited, but still little understood, figure of the Enlightenment.

A vast and far-ranging correspondence

If ever fully inventoried and edited, the Beaumarchais papers would no doubt include between 6000 and 20,000 documents. (The minimum estimate is based on the currently known corpus. The maximum is a seat-of-the-pants guess put forth by Brian Morton in 1969, based on his preliminary archival research. The actual number certainly lies somewhere in between, nevertheless making the corpus one of the largest of the period.) Beyond their sheer number, the Beaumarchais papers also stand out for their geographical and sociological breadth. From Vienna to Madrid to the Netherlands to England and North America, Beaumarchais’s correspondence network is far more than a simply ‘French’ or ‘francophone’ one. Moreover, Beaumarchais grants us insights into the 18th century that stand apart from those offered by the correspondences of other major figures. An artisan, a musician, a financier, commercial entrepreneur, printer, investor, politician, judge, diplomat, spy, litigant, criminal (he was imprisoned in at least four capitals), husband, lover, brother, father and, of course, a playwright, his correspondence brought him in touch with a wider swath of 18th-century European and North-American society than almost any other personality whose correspondence has been studied to date, with perhaps only Benjamin Franklin and Thomas Jefferson rivalling him in this respect.

Editorial history

The editorial history of the Beaumarchais correspondence traces across more than two centuries of literary and political history. Since his death in 1799, over 1500 letters have been edited, of which only slightly more than half feature a supporting critical apparatus.

Portrait of P. A. Caron de Beaumarchais, 1773, drawn by Charles Nicolas Cochin II, engraved by Augustin de Saint-Aubin. (The Metropolitan Museum of Art)

In the 19th century, fewer than 200 Beaumarchais letters were printed, mostly in editions of his works, but also in journals and biographies. The first edition of his complete works, edited by his amanuensis, Paul-Philippe Gudin de La Brenellerie (1809), included 55 letters, which Gudin had transcribed from the personal papers inherited by the writer’s widow upon his death. A second edition, by the journalist, historian and politician Saint-Marc Girardin, published in 1828, included 53 of the same letters, though with some editorial differences. An edition prepared in 1836 by the deputy curator at the Bibliothèque du roi, Jules Ravenel, included 10 letters reproduced from 18th-century periodicals, of which 6 were not published in either of the earlier editions. Also in 1836, the Revue rétrospective published a collection of 29 previously unpublished letters from manuscripts in the Comédie Française archives. The biographer Louis de Loménie, in his two-volume Beaumarchais et son temps (1858), referenced and included partial transcripts of hundreds of letters, but included in the appendix only 35 complete texts of previously unedited letters. A second biographer, Eugène Lintilhac, in his Beaumarchais et ses œuvres (1887), included 12 partially transcribed letters not previously published. (In 1890, Louis Bonneville de Marsangy published Madame de Beaumarchais, a biography of Beaumarchais’s third and final wife and widow, Marie Thérèse Willermaulaz; although Marsangy claimed to have consulted ‘sa correspondance inédite’, no letters are reproduced or directly referenced in the volume.)

In the 1920s, another 200 letters were brought into print from a variety of sources. In the early years of the century, as a young and ambitious man of letters, Louis Thomas undertook to produce a complete edition of the correspondence. However, military service during the Great War put an end to his research. In 1923, he published an edition entitled Lettres de jeunesse, including 167 letters from the first two decades of Beaumarchais’s adult life, of which 120 are attributed to manuscripts in the ‘Archives de Beaumarchais’ and the rest to printed sources. At least 80 of these had not been edited in earlier collections. (Thomas achieved renown as an editor and author in the interwar period before falling into ignominy during the Occupation as an ardent antisemite and collaborator whom the Vichy regime put in charge of the publishing house seized from Gaston Calmann-Lévy.) In 1929, the eminent French literature scholar in the United States Gilbert Chinard edited a collection of Lettres inédites de Beaumarchais consisting of 109 letters, mainly to Marie Thérèse Willermaulaz and their daughter, transcribed from manuscripts acquired by the Clements Library at the University of Michigan.

In the past half-century, the pace of publication has accelerated. In the late 1960s, Brian Morton (then a faculty member at the University of Michigan) launched a project to publish a complete Correspondence and began to transcribe letters from both public and private collections as well as reproduce previously published letters. In the 1970s, Donald Spinelli, then of Wayne State University (in Detroit MI), became his collaborator and continued the project. Together they published about 1000 letters, of which at least 300 were previously unpublished. Four published volumes (1969-1978) cover the years up to 1778 and are now available on open access. In 2010, Spinelli added a fifth volume, covering the year 1779, also on his professional website.

In 1990, Gunnar von Proschwitz, a noted philologist, and his wife Mavis published the most extensive critical apparatus associated with any edition of Beaumarchais letters. The notes and a lengthy introduction to this edition lay out the significance of these documents for our understanding of Beaumarchais’s life and of the 18th century. In these letters, we see Beaumarchais not only as a playwright seeking to circumvent censorship to have Le Mariage de Figaro finally staged, but also as an entrepreneur, a printer, an urban property owner, an emissary, and a transatlantic merchant. Through these documents we have a window on an 18th century that is geographically, socially, and culturally much broader and more diverse than what we generally encounter through other published 18th-century correspondences.

Current research
A letter from Beaumarchais to Antoine Dauvergne, director of the Académie royale de musique, dated 7 August 1787, about Salieri’s opera Tarare (with a libretto by Beaumarchais). (Gallica)

At present, the scholarly world can look forward to the benefits of the first new projects on Beaumarchais’s correspondence in over thirty years, including the effort spearheaded by Linda Gil to produce a definitive inventory with a material bibliography. Gil is also the editor of a forthcoming volume, Éditer la correspondence de Beaumarchais (to be published in the Cahiers du Centre d’étude des correspondences et journaux intimes), and one of the organisers of a conference on ‘L’Europe de Beaumarchais’, to be held in Paris and online on 20 and 21 January 2023.

My own contribution to this effort, begun in collaboration with Spinelli in 2019, is to prepare a searchable dataset of the 3500 documents and nearly 5000 references to letters known and unknown, with which to analyse Beaumarchais’s transatlantic network of correspondents. To date, nearly 3780 named identities have been extracted, of which 980 are unique individuals, and another 500 corporate entities have been identified. Working in collaboration with a talented doctoral student, Dakota Ciolkosz, with Voltaire Foundation colleagues who have extensive expertise in scholarly editing of correspondence, with Miranda Lewis and Howard Hotson of Early Modern Letters Online, and with Glenn Roe, whose ‘ObTIC’ laboratory of Sorbonne Université has done extensive work as well on 18th-century correspondences, this project will seek to make available in the coming years, on an open access and non-exclusive basis, the searchable dataset, the metadata drawn from these documents, and a prosopography of participants in the transatlantic correspondence network.

– Gregory Brown, Professor, Department of History, University of Nevada, Las Vegas; Senior Research Fellow, Voltaire Foundation, University of Oxford

An earlier version of this post appeared on the EE blog.

Editing and digitising marginalia

Voltaire’s comments on Frederick II’s L’Art de la guerre, Clement Draper’s depictions of chemical processes, Herman Melville’s pencil scores, or Samuel Beckett’s reading traces… these are all what we define as marginalia: the reader’s markings in the margins of a book. These markings are difficult to pin down in terms more specific than scribbles, references, and thoughts captured on a page. There is no apparent common rule that groups them together and specifies how they should be understood as a whole, even though they are often studied as an ensemble or a genre. Furthermore, the line – if there is a line – that defines the margins themselves is not always evident, and that is why scholars are constantly questioning what marginalia are, while trying to differentiate between the primary text and its annotations. As Laura Estill acknowledges in her article ‘Encoding the edge: manuscript marginalia and the TEI’, ‘perhaps there are easier distinctions to be made when marginalia is handwritten in printed books – although even then, in the case of authorial revisions, stop-press corrections, or (say) Whitman’s notes in another book, there is no easy answer as to what is “marginal”’.

A discussion of what exactly this marginal space is and how it interacts with the text is crucial when considering the central query of the Editing and Digitising Marginalia workshop: how can the marginalia of source material be encoded as fully, accurately, and helpfully as possible? By trying to define the purpose and character of Voltaire’s, Draper’s, Melville’s and Beckett’s marginalia, Nicholas Cronk, Gillian Pink, and Dan Barker; and Zoe Screti, Christopher Ohge, and Dirk Van Hulle respectively delved into the challenges of digitally editing marginalia, which requires a completely different framework of analysis compared to pre-digital editions or even digital facsimile editions. Following on from the OCTET colloquium on Writers’ Libraries, this workshop explored the importance of studying authors through their reading practices. It focused on the editorial choices behind digitally encoding marginalia, with the added layer of complexity that derives both from the difficulties and the possibilities of the digital medium.

When designing a data model that could represent marginalia as a key component of Voltaire’s complete works, for example, the verbal elements were comparatively easier to encode than the non-verbal marks. Voltaire used different materials to underline, draw, and mark the pages he was reading, or he folded, licked, and stuck them together. How can these practices possibly be translated into the digital sphere? For this digital project, the source material came from the transcribed print volumes of the Corpus des notes marginales de Voltaire, which were themselves one step removed from the original source material, since they had already undergone an editorial process that transformed the original squiggles into typeset signs.

Dan Barker, ‘The aim of digitising OCV’, picture taken by author.

Dan Barker, the Digital Consultant at the Voltaire Foundation, explained in his presentation ‘The aim of digitising OCV’ how he had created a system of mark types to record these marks in order to reproduce source material fully, accurately, and helpfully. He classified a mark according to nodes (the points where the lines meet or cross) or edges (uninterrupted lines) to convey their nature, presence, and relationship to the text. Even if the method does not account for the colour, medium, intensity, or even authorship of marginal marks, readers will be able to search for specific classifications of marks and see if Voltaire used them more than once and where. It is a process that operates within the principles proposed by Gillian Pink of what a new-born digital edition of a manuscript should be: legible, containing both visual and non-verbal elements, and searchable, taking into account the modernisation of the transcription to avoid the potential pitfalls of searching for idiosyncratic spellings.

The issue of searchability was further discussed by Zoe Screti, a postdoctoral researcher at the Voltaire Foundation, in her paper ‘Alchemical marginalia written in prison and cataloguing marginalia’. The quantity and diversity of Clement Draper’s marginalia, in the shape of memory aids, summaries, symbols, diagrams, or eyewitness accounts, are not reflected in the catalogue entries of his archival materials. That discrepancy points towards an incompatibility in the way catalogues were built and the questions that scholars are asking now, hence why Screti is updating the system with usability and consistency in mind, both of which aim to make sources of marginalia accessible and discoverable.

She has access to a subset of Voltaire’s manuscripts and is cataloguing them from scratch, which provides her with a decision-making margin that others might not be able to work with. They are also small in size, allowing for a detailed granularity that would be difficult to obtain if working with Draper’s notebooks, for example. But the challenges of ensuring that catalogues keep up with the pace of research on marginalia remain, in big and small collections alike. If we want to be able to locate specific categories of marginalia, as is the case with Voltaire’s non-verbal markings, and include nuances in our current search and text analysis tools, they need to appear in the catalogue entries, and that means going beyond filters and single codes.

Voltaire’s non-verbal annotations to the Marquis de Vauvenargues’s Introduction à la connaissance de l’esprit humain and their appearance in the Voltaire Foundation’s edition of the marginalia.

Finally, both Melville’s and Beckett’s marginalia are representative of common methodological issues in terms of how to create a uniform TEI data model. As Christopher Ohge explained in his talk entitled ‘Melville’s Marginalia Online, with some general provocations’, there is no solution that covers all cases of marginalia encoding, and that is why current projects have very different data models. He provided an overview of those differences, showing how in Keats’s Paradise Lost, a Digital Edition or Whitman’s marginalia to Thoreau’s A week on the Concord and Merrimack Rivers, marginalia are wedged into the hierarchy of the existing text to make it work within different structures, while Archaeology of Reading has a bespoke XML tagging structure with a marginalia attribute.

But changing content IDs and crossing over the hierarchy of line elements or having a general term that does not include subtleties is not the methodological solution chosen for Melville’s Marginalia Online. This research tool uses software developed by the Whitman Project to generate the page coordinates of the already uploaded facsimile images, to find a page directly with a word search. Melville’s marginalia are encoded in a <div> tag with several attribute values, so as to include all detail and information. The question posed by Ohge then was as follows: how much context is needed to understand marginalia, and how much granularity?

In an intervention entitled ‘Editing Beckett’s Marginalia’, Dirk Van Hulle answered by stating that it depends on the author, the type of marginalia they wrote, and the resources available for the digital project that provides such context. One of the key elements that digital marginalia allows, as is the case with Beckett, is an insight not only into the reader himself, but the underlying structure of all his drafts and notebooks: a network of markings that, in turn, puts into context how his reading engendered his writing.

In order to make that network visible and searchable, one of the solutions going forward is to use IIIF (International Image Interoperability Framework) as a means of engaging with marginalia. Making resources IIIF compliant ensures they are interoperable with other software, as well as easy to maintain as an online resource with which scholars can interact. It is also culturally inclusive, as it operates on a ‘blank canvas’ principle meaning that non-codex objects can be presented in full.

A piece of marginalia in Voltaire’s copy of the Marquis de Vauvenargues’s Introduction à la connaissance de l’esprit humain demonstrating a stark difference in line weight.

IIIF image viewers could potentially work with improving transcription software, such as Transkribus, to allow for comprehensive resources that can display an image of the page with all its marginalia, paratext, and physical attributes as well as an interactive description and viewable transcription. The ability to describe elements of a text accurately and efficiently via pinpointing areas that have their own locus of metadata, as IIIF is capable of, means that more effort can be devoted to accurate scholarship, which is precisely what Gillian Pink stated in her paper ‘Editing Voltaire’s commentary on Frederick II’s L’Art de la guerre – third time lucky?’ She proposed, for example, to use different colours for the different hands that worked on the manuscript (Frederick II, his secretary, and Voltaire) as a way to take advantage of annotation possibilities with IIIF. However, the question remains: how can we decide which textual blocks should be transcribed as a unit in order to properly represent Voltaire’s marginalia?

The various contributions to the Editing and Digitising Marginalia workshop helped us sketch some answers to this question. Nonetheless, many threads were left to pull, ensuring that, hopefully, there will be another workshop to show how all the projects have built on existing methods while defying their own limits and scope, so that we keep rediscovering authors through the marginal notes that they left.

– Joana Roque

Related Posts

Annotation in scholarly editions and research

It has been, alas, almost exactly a year since our last face-to-face Besterman Workshop at 99 Banbury Road. Of course, webinars allow more people to join, and to do so, most importantly, from the comfort of their homes, where they can sit comfortably and set their thermostats to the temperature that suits them best. The advent of the Zoom/Teams era, however, has brought with it a number of unfortunate consequences: discussions are not as lively as they used to be, asking a follow-up question is nearly impossible, and so are chats with friends and colleagues, before, during, or after the talk. Worst of all, we no longer get a chance to eat our beloved Leibniz or Belgian biscuits – but those, to be fair, had already become something of a rarity towards the beginning of 2018. Anyway: those of you who did attend our last face-to-face Besterman Workshops may remember this gloomy and cumbersome poster of mine hanging from the mantelpiece.

This poster was presented at a conference in Wuppertal, Germany, at the end of February 2019: ‘Annotation in Scholarly Editions and Research: Function – Differentiation – Systematization’. Organised by Julia Nantke (Universität Hamburg) and Frederik Schlupkothen (Bergische Universität Wuppertal), this two-day bilingual Anglo-German colloquium was a wonderful occasion to reflect on the age-old human habit of glossing, commenting, and generally interfering with other people’s work.

Alongside some theoretical papers (to mention but one, Willard McCarty’s brilliant keynote lecture on annotation as a knowledge-producing practice), the symposium featured several more practice-oriented talks that would have certainly been of interest to many of our Digital Humanities followers: some focused on how best to structure and visualise annotation in digital scholarly editions; others raised the question as to how to annotate audio-visual materials; and yet others investigated the extent to which annotation can be automated.

Some of the papers given at the ‘Annotation in Scholarly Editions and Research’ conference can now be read in a volume published last year (yes, in 2020!) by De Gruyter and available in print as well as an Open Access eBook.

My own contribution to the volume (which you can find here, should you want to read it) presents what I think might be an efficient and user-friendly three-level annotation system, the ‘reversible annotation system’, which I developed while working on Digital d’Holbach, a born-digital scholarly edition of Paul-Henri Thiry d’Holbach’s complete works. On this model, I argue, a single set of notes can be so structured as to cater to very different audiences, meaning that the edition can hope simultaneously to be user-friendly and cost-efficient. Should you have any comments or suggestions for improvement, please do not hesitate to let me know!

Ruggero Sciuto, University of Oxford

Introducing Tout d’Holbach

Have you ever used Tout Voltaire or the ARTFL Encyclopédie and thought: ‘Wow! This is so helpful!’? Have you ever planned on giving a Zoom talk on pandemics in Diderot and D’Alembert’s Encyclopédie and realised that all you had to do to get your primary sources was to search the database for ‘peste’, ‘pestilent.*’, ‘épidémi.*’, nothing more? Or maybe you wanted to write an article on Voltaire and dodos? You looked up ‘dodo’ in Tout Voltaire, and it only took you about three seconds to realise that you had pushed your quest for originality a bit too far. Have you ever wished that something like Tout Voltaire existed also for other authors? Well, if you work on d’Holbach, we’ve got good news for you!

The ARTFL Project at the University of Chicago and the Voltaire Foundation are very pleased to announce the release of Tout d’Holbach, a database that brings together fully searchable transcriptions of the vast majority of d’Holbach’s works. (If at this point you cannot be bothered to read more and wish to start experimenting with the database right away, here is the link:

At the moment, Tout d’Holbach only includes d’Holbach’s original writings, defined as those considered to be ‘œuvres originales publiées isolément’ (‘original works published separately’) in Jeroom Vercruysse’s fundamental Bibliographie descriptive des imprimés du baron d’Holbach (1971; new ed. 2017) (The Essai sur les préjugés and the Tableau des saints are not there yet, but they will be soon! We promise!). Moving forward, full transcriptions of d’Holbach’s translations and editions, respectively marked as Ds and Fs in Vercruysse’s bibliography, will be added, making the database more worthy of its high-sounding name.  At the same time, we are also thinking about making Tout d’Holbach a bit less ‘d’Holbach’: adding to the database texts whose attribution to the Baron is highly controversial will put us, we hope, in a position to better understand the real contours of d’Holbach’s textual corpus, thus answering a question that has occupied scholars’ minds for more than two centuries.

Thanks to the generosity of the Andrew W. Mellon Foundation, the Voltaire Foundation is currently working on a born-digital critical edition of d’Holbach’s writings: Digital d’Holbach. Unlike Digital d’Holbach, Tout d’Holbach is not a critical edition: none of the texts is annotated, and the transcriptions, while broadly accurate, may contain occasional typos. Tout d’Holbach is a research tool, and one, we hope, that will prove invaluable to researchers collaborating on Digital d’Holbach as well as to scholars working on the European Enlightenment more broadly.

So, here is the link again for those of you who haven’t yet given in to temptation and already clicked on it:

P.S. If you have some time to spare while you #stayathome and would like to contribute to the project by checking the transcription of a section of one of d’Holbach’s works, or if you would like to know more about Digital d’Holbach, please email Ruggero Sciuto at

– Ruggero Sciuto and Clovis Gladstone

A born-digital edition of Voltaire’s Dialogue entre un brahmane et un jésuite

Just as the print edition of the Œuvres Complètes de Voltaire is fast approaching its completion, we at the Voltaire Foundation are starting work on two new, highly ambitious digital projects thanks to the generosity of the Andrew W. Mellon Foundation: a digital edition of Voltaire’s works based on the Œuvres complètes (Digital Voltaire), and a born-digital edition of the works of Paul-Henri Thiry d’Holbach (Digital d’Holbach).

With a view to gaining the necessary skills required to begin my work on Digital d’Holbach, in autumn 2018 I attended an intensive course on digital editions run by the Taylorian Institution Library. Taught by Emma Huber in collaboration with Frank Egerton and Johanneke Sytsema, the course takes students through all the phases of the digital edition workflow, from transcription to publication and dissemination. It is a goal-focused, hands-on course during which students are warmly encouraged to create a born-digital edition of a short text from the Taylorian’s collections.

Although short and apparently light in tone, the piece that I chose to edit – Voltaire’s Dialogue entre un brahmane et un jésuite sur la nécessité et l’enchaînement des choses – is a key text in the evolution of Voltaire’s philosophical views. As the title suggests, the Dialogue hinges on the question of determinism (or fatalisme, in eighteenth-century French parlance) and touches on such crucial notions as moral freedom, causation, and the problem of evil. It was first published anonymously in the Abeille du Parnasse of 5 February 1752, and it then went through several reprints during Voltaire’s lifetime, with very few variants.

My edition of the Dialogue is of course not meant to replace the one already available in OCV. Rather, it was conceived to meet the needs of the broader public – and more specifically those of students. A very short introduction, displayed on the right-hand side, provides essential information on the philosophical issues at stake while situating the Dialogue in relation to other key texts by Voltaire. An original translation into English by Kelsey Rubin-Detlev makes the text more widely accessible, allowing students working in fields other than modern languages (e.g. philosophy) to engage with Voltaire’s ideas. High-quality pictures of the 1756 edition, which provides the base text, aim to give non-specialists a taste of what it feels like to leaf through a (dusty) eighteenth-century book. Finally, a modernised version of the text is available next to the facsimile, and a rich corpus of annotations – displaying in both the French transcription and the English translation and featuring links to several other digital resources (the ARTFL Encyclopédie and Tout Voltaire, but also Wikipedia and BibleGateway!) – aims to render the reading experience as informative and rewarding as possible.

But there is more to this edition than first meets the eye! For example, by clicking on ‘Downloads’ in the menu bar, a fifth column will appear from which the user is invited to download pictures as well as TEI/XML files, which can then be used as models to generate further digital editions. Also, a drop-down menu in the transcription column allows users to choose between two different versions of the text in addition to the modernised version displayed by default: a diplomatic transcription of the 1756 edition and a diplomatic transcription of a 1768 edition, which comes with its own set of images that are also available for download under a Creative Commons Licence. By looking at these texts, users will get a sense of how radically French spelling evolved in the mid-eighteenth century.

Readers of this blog are most cordially invited to browse my edition. Any feedback on content or presentation (e.g. the way footnotes or variants are displayed) would be greatly appreciated as I work towards an edition of a considerably longer text by d’Holbach. But more on that in the coming months!

Ruggero Sciuto




The humanist world of Voltaire’s correspondence

We know from reading Voltaire’s letters that he likes quoting – French literature in abundance, but also a fair amount of Latin. There is often a strong sense that he is quoting from memory, which is more than likely the lasting mark of his Jesuit teachers at Louis-le-Grand, who put Latin at the centre of the curriculum. Indeed, Voltaire had the benefit of some renowned Jesuit scholars as his teachers, notably Le Père Porée, who famously taught a ‘Senecan’ prose style, and Le Père Thoulier (later the abbé d’Olivet), a distinguished Cicero scholar who remained on friendly terms with Voltaire throughout his career.

Latin verse in particular, played a preponderant role in Voltaire’s education, as poets were at the heart of college teaching, and Virgil, Ovid, and Horace were by far the big three since the 16th century at least.[1] The Jesuits taught primarily by way of daily recitals (recitatio) of verse required by all students: ‘On attachait à la recitatio une importance dont nous n’avons pas idée aujourd’hui…’ (Dainville, p.175). Thus, students at Louis-le-Grand all committed large chunks of Latin verse to memory as both a means of imitation for learning to write, and also as a method of retaining information, as Voltaire would elsewhere describe the pedagogical approach of the Jesuit Claude Buffier: ‘Il a fait servir les vers (je ne dis pas la poésie) à leur premier usage, qui était d’imprimer dans la mémoire des hommes les événements dont on voulait garder le souvenir’.[2]

Collège de Louis le Grand, circa 1789.

Collège de Louis le Grand, circa 1789.

Given this background, we aimed to examine Voltaire’s use of Latin quotations across his massive collection of correspondence, described by Christiane Mervaud as ‘perhaps his greatest masterpiece’. The Besterman edition of Voltaire’s correspondence, originally published in some 50 print volumes, and digitised in the early 2000s as part of the Electronic Enlightenment project, contains 21,256 letters of which 15,414 are written by Voltaire himself. It is astonishing, then, that this masterpiece remains relatively unstudied. Besterman identifies Latin passages when they are from the major writers (Horace, Virgil, Ovid, Lucretius) – the authors for whom there were concordances easily available in the 1950s and 1960s. In the case of lesser poets like Manilius, however, Besterman was obliged to leave the passages unannotated. These passages can now be easily identified thanks to new methods developed in the digital humanities. In particular, as part of this year’s research programme in the Voltaire Lab, we compared all of Voltaire’s letters to Latin digital sources in an effort to systematically identify all of his Latin quotations, while at the same time, as we’ll see below, exploring the social and intellectual networks over which these quotations were exchanged.

<img class=”size-medium wp-image-2573″ src=”″ alt=”Marcu Manilius, Astronomicon, 1767.” width=”300″ height=”253″> Marcu Manilius, Astronomicon, 1767.

Using sequence alignment algorithms designed to identify literary text re-use at scale –developed in collaboration with the ARTFL Project at the University of Chicago – we identified some 672 Latin citations in Voltaire’s correspondence by comparing the letters to the Packard Humanities Institute’s Classical Latin Texts (PHI) digital corpus. The PHI contains essentially all Latin literary texts written before A.D. 200, as well as some texts selected from later antiquity. The resulting alignments allow us to move beyond Besterman’s ad hoc manner of identifying quotations towards a more systematic understanding of Voltaire’s use of Latin authors.

After some data pruning – the inclusion of several commentators and grammarians from Late Antiquity in the PHI dataset meant that there were some repeated matches that were spurious – we reduced our set of Latin passages to 342 citations used by Voltaire himself to his various correspondents. Here is a list of these quotations by Latin author in descending order:

Table 1. 342 individual Latin passages found in letters by Voltaire.

Table 1. 342 individual Latin passages found in letters by Voltaire.

Overwhelmingly Voltaire prefers to quote Latin poets; and that Horace, Virgil and Ovid should be the top three is hardly surprising, though the presence of Horace is dominant. There is breadth as well as depth here, and the list goes beyond the usual suspects to include minor figures such as Manilius, Statius, and Cato the Elder. Does this mean, for instance, that Voltaire is quoting someone like Manilius from memory? If so, how interesting and altogether unexpected.

The next important question we broached was concerned with the recipients of Latin passages, i.e., who are the adressees of the letters in which these Latin quotations appear? In all we found 101 different recipients of at least some Latin, out of 1,465 total recipients in Voltaire’s correspondence (roughly 14.5 %). This is quite small, as a proportion of addressees overall. So how can we gloss these names as members of a group, or network of Latin quotations?

Table 2. Addressees with more than five Latin quotations.

Table 2. Addressees with more than five Latin quotations.

Using the ‘Procope’ social network ontology of the French Enlightenment, established by Dan Edelstein et al., at Stanford,[3] we were able to automatically assign social categories to our list of addressees, which while not a perfect system, nonetheless helped us understand the fundamentally ‘elite’ status of this sub-set of Voltaire’s correspondents.

Gender is an obvious criterion that is apparently lacking: all addressees are male apart from one. Given that men learned Latin, and women didn’t, the use of Latin quotations is self-evidently gendered in this case. This is further reinforced by the manner in which Voltaire uses two verses by Virgil with La Duchesse de Choiseul, his one female addressee, in a letter from 1771:

‘Pour moi, Madame, qui les aime passionément je vous dirai
Ante leves ergo pascentur in æthere cervi
Quam nostro illius labatur pectore vultus.’

‘Vous entendez le latin, Madame, vous savez ce que celà veut dire:
Les cerfs iront paître dans l’air avant que j’oublie son visage.’

After quoting the two lines from the Bucolics, Voltaire goes on to translate them for Madame de Choiseul, even though she can presumably understand the Latin – a case of early-modern ‘mansplaining’ in action.

Within the group of 101 addressees, there is a clearly-defined social group of old, close friends from school (those with whom he had learned Latin), as well as an overlapping sub-group in Normandy, or in one case from Voltaire’s early law career:

Addressees from Louis-le-Grand, where Voltaire learned Latin:

  • The Marquis d’Argenson (later foreign minister)
  • The Comte d’Argenson (later war minister)
  • The Duc de Richelieu (soldier and leading courtier)
  • The Comte d’Argental, conseiller au parlement de Paris
  • Pierre-Robert Le Cornier de Cideville, conseiller au parlement de Rouen

Other old friends from the overlapping Normandy/law group:

  • Formont, a wealthy, talented light poet who was also friends with Cideville.
  • Theriot, a an early friend of Voltaire’s, from when they were both young apprentice lawyers, who was also friends with Formont and Cideville.

Otherwise, we find many cultivated acquaintances in this list who are themselves authors: Frederick, Algarotti, D’Alembert, etc.; along with one of Voltaire’s teachers from Louis-le-Grand: d’Olivet, translator of Cicero and Desmosthenes into French, elected to the Académie in 1723. Clearly, Voltaire’s use of Latin was a means of determining readership. By constructing an epistolary community with selected groups of correspondents, Voltaire underscored their shared experiences and humanist culture.

But, to what extent was this sort of cultural exchange reciprocal? I.e., if Voltaire writes to you quoting Latin poets, do you feel obliged to respond in kind? What does it mean, for instance, that Voltaire uses Latin in so many letters to Frederick, and yet the prince never once uses Latin in return? Socially, the 41 respondents identified belong by-and-large to the same ‘elite’ categories of government or aristocracy, although there is a markedly greater presence of hommes de lettres (an ‘intellectual network’ that overlaps with the ‘social networks’ drawn from Procope) in this second list. See Table 3.

Table 3. Respondents with more than two Latin citations.

Table 3. Respondents with more than two Latin citations.

These are just some of the preliminary results we have begun to process in the context of a larger project on Voltaire’s culture of text re-use (including his penchant for ‘self-plagiarism’). As with most digital humanities projects, initial computational analyses don’t always produce ‘clean’ results, or cut-and-dried interpretations: some of the results have to be examined carefully, and some – as was the case for the grammarians and commentators mentioned above – will prove spurious or misleading. One begins asking one set of questions – can we identify Voltaire’s use of Latin and verify Besterman’s attributions – and end up with new ones: e.g., with whom did Voltaire use Latin, and how? Equally, we could extend these questions by examining other literary quotations, e.g., from French or Italian authors and by including other correspondence collections, comparing Diderot and Rousseau’s use of Latin, for instance, to that of Voltaire.

Ideally, this sort of experimental research approach also generates new research questions, ones that would have been difficult to frame outside of the digital environment. In this case, we were quickly confronted with the notion of what constitutes an instance of ‘re-use’ as opposed to an allusion or more oblique cultural reference. For example, our algorithm identified this passage from Cicero’s epistles:

‘Vale. CICERO BASILO S. Tibi gratulor, mihi gaudeo. te amo, tua tueor. a te amari et quid agas quidque agatur certior fieri volo…’

as a potential re-use employed by Voltaire in a letter to Marmontel from 1749:

‘Si vous recevez ma lettre ce soir, vous pourrez m’envoyer votre poulet pour m. de Richelieu, que je ferai partir sur le champ. Te amo, tua tueor, te diligo, te plurimum, &c.’ [5]

Is this re-use or not? Besterman makes no mention of Cicero in his annotation, but rather places this passage into a more generic class of ‘Roman epistolary formulas’. But perhaps there is more going on here; perhaps the model of Cicero’s epistles – central to the Jesuit syllabus – remains at the forefront of Voltaire’s mind when he himself is in the act of letter-writing. With the sorts of addressees for whom Voltaire uses Latin quotations he may likewise use a Ciceronian subscription. Here the Ciceronian model shapes Voltaire’s epistolary rhetoric.

Finally, pushing this line of enquiry a bit further, we came across another discovery: there are reduced versions of the passage, “Vale. Te amo”, which Voltaire uses extensively in the correspondence, and in particular with the social network of old school friends outlined above. This passage is in fact too small to be identified by our matching algorithms, and we would furthermore be a bit hard-pressed to classify it as a singularly Ciceronian borrowing. And yet…

– Nicholas Cronk and Glenn Roe

[1] See François de Dainville, L’Education des jésuites (XVIe-XVIIIe siècles) (Paris, Minuit, 1978).

[2] Voltaire, Siècle de Louis XIV, ‘Catalogue des écrivains’, OCV, vol.12.

[3] See Maria Teodora Comsa, Melanie Conroy, Dan Edelstein, Chloe Summers Edmondson, and Claude Willan, ‘The French Enlightenment Network’, The Journal of Modern History 88, no. 3 (September 2016): 495-534.

[4] [D17251]. Voltaire [François Marie Arouet], ‘Voltaire [François Marie Arouet] to Louise Honorine Crozat Du Châtel, duchesse de Choiseul [née Crozat]: Monday, 17 June 1771’. In Electronic Enlightenment Scholarly Edition of Correspondence, University of Oxford.

[5] [D3918]. Voltaire [François Marie Arouet], “Voltaire [François Marie Arouet] to Jean François Marmontel: Friday, 2 May 1749”, in Electronic Enlightenment Scholarly Edition of Correspondence, University of Oxford.

The Newberry French Revolution Collection at ARTFL

As we begin planning Digitizing Enlightenment IV, which will take place in the context of the ISECS Congress in Edinburgh in July 2019, we are keen to broaden the scope and breadth of the Digitizing Enlightenment community in order to highlight new, and existing, digital projects across the interdisciplinary spectrum of eighteenth-century studies. This post, based on work presented at the Digitizing Enlightenment III workshop held in Oxford in July 2018, demonstrates how to identify text reuse – citations, borrowings, plagiarisms – as well as other techniques for leveraging freely available large data-sets from the 18C.
– Glenn Roe, Voltaire Lab

The incredible richness of the Newberry Library’s French Revolution Collection (FRC) has been long known. It consists of more than 30,000 pamphlets and more than 23,000 issues of 180 periodicals published between 1780 and 1810, representing the opinions of all the factions that opposed and defended the monarchy during the turbulent period between 1789-1799 and also contains innumerable ephemeral publications of the early First Republic. The Newberry has released digital copies of more than 35,000 pamphlets totalling approximately 850,000 pages. Not only has the Newberry made the collection available to the public, but it has released a data feed of the entire collection, consisting of the Library’s exceptional metadata describing each object, the OCR text data, and links to the digital facsimiles accessible from the Internet Archive, encouraging researchers and instructors to incorporate the digital collection in new kinds of scholarship and engagement.

In order to facilitate experimental work at ARTFL on this unparalleled resource, we have loaded two versions of this collection – based on a download of the collection from the Newberry’s GitHub repository in November 2017 – into PhiloLogic4, the latest release of ARTFL’s text analysis software. The full version contains all 38,377 documents dating from the 16th century to the end of the 19th century. Our second build attempts to eliminate duplicate documents, is restricted to the period 1787-1799, and thus contains 26,445 documents.   Additional implementation information and full open access to both versions of the FRC collection are available online. The quality and coverage of the FRC texts makes it an ideal environment to test a variety of experiments and algorithms to enhance access and open new kinds of approaches using the 1787-99 sample data. At the bottom of the ARTFL FRC page, we have provided links to several different models for examining the collection which are based on extensions to the PhiloLogic4 package.

The simplest model is a document level search which returns matching documents by relevancy ranking based on Python Whoosh. This functions somewhat like a Google search on the collection, with links to the page images of the document or specific instances of the search words in context. For example, the results of a search for “conspirateurs aristocrates ennemis étrangères royalistes” can be seen here.

The second approach is the application of a Topic Model algorithm to the collection. Topic Models are a set of unsupervised learning algorithms that divide collections into a specified number of clusters based on vocabularies of each document which is widely used in digital humanities. The results of the Topic Model has been added to the metadata of the PhiloLogic4 build of the 1787-99 sample data. Each document is identified as having a first and second topic, denoted as A or B, with a number from 00-49 as listed in this TABLE. This first column is the topic number, the second is one or more english keywords which can also be searched. The third column is the top 3 weighted words (features) of that topic, and the 4th column is the rest of the top 10, all of which are shown in relative weight order. Thus, A29 will return the documents that have money assignats as the top weighted topic. Searching for “money” in topic models will get this as eight the first or second topic.   An alternative use of this data is to copy some or all of the terms in columns 3 and 4 into the Whoosh search form and get the documents in a ranked relevancy order.

Our first presentation of our work at the Digitizing Enlightenment III showed results from applying the latest version of our sequence aligner to detect text reuse – citations, borrowings, plagiarisms, and so on – from pre-Revolutionary documents during the Revolutionary period. Sequence alignment is a family of algorithms used in a surprising range of disciplines from genetics to text analysis to identify similar segments of arbitrary length. For this work, we aligned the FRC 1787-99 sample against ARTFL’s Frantext pre-1788 collection. The Frantext sample contains 1,263 documents and is particularly strong in 18th century holdings. We loaded the results of the alignment run in a dedicated database which can be queried in a variety of ways, such as source and/or target metadata as well as by words in matching passages.

The public database (June 22, 2018 build) found 8,937 aligned passages, or which around 1,000 were identified algorithmically as banalities. Filtering out shorter alignments, less than 10 words, results in just under 7,000 passages. It is important to note that these numbers are very relative, since they can vary significantly depending on the approach we use to identify and merge, where appropriate, longer passages. The general frequencies are not particularly surprising. The following is a table of the number of borrowed passages in the FRC by author.

Montesquieu – 1,315

Rousseau – 1,133

Voltaire – 979

Mably – 303

Aulony – 263

Racine – 168

Helvétius – 167

D’Holbach* – 146


Saint-Simon – 135

Bossuet – 110

La Fontaine – 94

Diderot – 85

Corneille – 72

Mirabeau – 71

Boileau – 69

Bernardin – 67

Montaigne – 65

*D’Holbach appears as two entries due to slight metadata differences.

The yearly distribution of borrowings from the top three Enlightenment authors again follows a reasonable pattern.

The annual distribution in the FRC of the 536 passages derived from Rousseau’s Contrat Social, seems reasonable and would match expectations based on other things we know.

While the global numbers are interesting, if not very surprising, there are number of specific texts and authors which would warrant further investigation. There are numerous chapbooks, such as the Calendrier moral, 1794, which are interesting because of their selection of inspiring passages from various authors. Jean-Jacques Barthélemy’s L’Accord de la religion et de la liberté (1791) features some 25 long extracts from d’Holbach’s Système social.

The alignment database is available to the public. The database has a variety of useful features. This link will push a search for all of the aligned passages in the FRC from Rousseau’s Contrat Social greater than 10 words. The report is laid out chronologically (in this case by FRC year). Each instance shows the matching passages with available metadata, links to the context of each passage, and a button to highlight the differences in each matching pair. The facets on the right will allow you to get frequencies by author, title, year and so on. Clicking on those will return the corresponding text pairs.

We anticipate further experimental work on the FRC, most notably in using the excellent subject information as ways to assess the accuracy of Topic Modelling and to consider supervised learning algorithms to further classify the collection by subject.

It is our pleasure to acknowledge that the Newberry Library has released this extraordinary resource under the Open Data Commons Attribution License, ODC-BY 1.0.   We believe that this splendid collection and the Newberry’s release of all of the data will facilitate a generation of ground-breaking work in Revolutionary studies. If you find the collection useful, please do contact the Newberry Library to congratulate them on this wonderful initiative and how their efforts contribute to your research.

We would love to hear from you. Please send comments, suggestions and problem reports to

– Clovis Gladstone and Mark Olsen


Poetry in the digital age: the Digital Miscellanies Index and eighteenth-century culture

For most of us, reading for pleasure usually means getting stuck into some fiction or non-fiction. Poetry is a less common diversion, but we still have an appetite for poems to dip into, to find solace in, to memorise and share. And we can choose from an array of collections that promote poetry as an everyday companion, a form of therapy, and a tradition of national interest. For readers looking for peace of mind, The Emergency Poet: An Anti-Stress Poetry Anthology offers comfort, while the popular twin collections of Poems That Make Grown Men (or Women) Cry present a cult of sensibility for the modern age.

It was in the eighteenth century that poetry collections like these became a staple of literary publishing in Britain. The tradition of printed collections of English poetry stretches back to the sixteenth century, with Songes and Sonettes (1557), an edition of short lyric poems compiled by the publisher Richard Tottel, generally regarded as the foundation of English Renaissance poetry and the most important early printed collection of English verse. But it was not until the eighteenth century that collections of poems by several hands, with prose as a secondary feature, became one of the most common forms in which British readers encountered poetry. Like their modern counterparts, eighteenth-century editors and publishers sought to gain a foothold in a crowded market by targeting specific audiences and promoting the benefits of reading poetry. Some produced didactic collections for young people (Poems for Young Ladies); others pitched their collections to lovers in need of poetic inspiration (The Lover’s Manual); and many more set their sights on a local audience (The Oxford Sausage).

Poems for Young Ladies

Poems for Young Ladies (1767), edited by the poet Oliver Goldsmith.

Collections like these shaped the ways in which poetry was written and read throughout the eighteenth century. Yet until recently relatively little was known about their contents. Thanks to the Digital Miscellanies Index (DMI), this is no longer the case. The DMI provides a searchable record of the contents of over 1,600 collections of poems by several hands published over the course of the eighteenth century. These books are sometimes referred to as anthologies, as most poetry collections are today. But the word anthology, derived from the Greek for ‘a gathering of flowers’, has connotations that sit uneasily with many eighteenth-century poetry collections. Few collections produced in this period claimed to present the best of English poetry, a rationale often seen as characteristic of anthologies (collections that cull the flowers of the poetic tradition). As a result, several scholars, myself included, prefer the term miscellany. Derived from the Latin miscellanea, meaning a ‘hotchpotch’ of foodstuffs, it captures the dominant characteristic of most eighteenth-century collections: variety. A typical miscellany offers a varied feast of poems to entertain readers with varied tastes and personalities.

The DMI was launched in 2013, following three years of development and data collection carried out by a team based at the University of Oxford. Led by Abigail Williams and Jennifer Batt, the project was funded by the Leverhulme Trust. In 2014, another Leverhulme grant set in motion the second phase of the project. One of the aims of this phase, to be completed in 2017, is to harness the data now accessible via the DMI to shed new light on how miscellanies evolved, how they packaged and popularised poetry, and on the habits of their readers. At the same time, we are working with the Bodleian’s Digital Libraries team to develop the DMI into a more flexible and wide-ranging resource, and last month we celebrated a milestone on this road. The thirty-strong audience at Lines of Connection, a conference I co-organised as part of the project, were among the first to see the DMI’s new search interface, which replaces the beta site created in 2013.

The Book of Fun

The Book of Fun (1759), a miscellany dominated by seventeenth-century verse.

The new search platform is much more than a digital facelift for the DMI. It provides access to a database undergoing expansion: the latest version includes new records for miscellanies published between 1680 and 1699, and future updates will extend the DMI ’s coverage further back to Tottel’s foundational Songes and Sonettes. The redeveloped interface also enables users to explore the data in new ways. Keyword and phrase searching is quicker and more extensive with the new basic search function. There is also the option to filter the records using a number of facets, which display and rank the data in ways that suggest key trends and lines of enquiry. For instance, clicking on ‘Poem’ under ‘Content Type’, then selecting the ‘Related People’ facet, reveals a list of almost one hundred of the most prominent authors in the database, ranked according to the number of poems attributed to them. At the top of the list is John Dryden, with around 1,500 poems; the highest ranked French author is Nicolas Boileau-Despréaux, with over 120 poems in English translation (the DMI does not record appearances of poems in foreign languages). Although these figures should not be seen as straightforward indications of popularity, they remind us that many of the most widely read poets of the eighteenth century were those who had been active in the late seventeenth century. In his imitation of Horace’s epistle to Augustus (written 1737), Alexander Pope observed that the verse of his seventeenth-century predecessors was scattered ‘Like twinkling stars the Miscellanies o’er’. The DMI has made it possible to see these stars, and the sky around them, more clearly.

– Carly Watson

Digitizing Raynal (and Diderot): New Digital Editions of the Histoire des deux Indes

A collaborative digital research project

On the heels of Cecil Courtney and Jenny Mander’s recent publication, Raynal’s ‘Histoire des deux Indes’ colonialism, networks and global exchange (OSE, 2015), I am pleased to announce a new international research project aimed at further exploring Raynal’s monumental work and its impact on Enlightenment thought. Thanks to the generous support of the Consortium for the Study of the Premodern World at the University of Minnesota, the Centre for Digital Humanities Research at the Australian National University, Stanford University Libraries, and The ARTFL Project at the University of Chicago, we have recently completed the digitization and text encoding (in TEI-XML) of the three primary editions of the Histoire philosophique et politique des établissements et du commerce des Européens dans les deux Indes. These editions – the first edition of 1770, the second of 1774, and the 1780 third edition – were those that Raynal himself oversaw during his lifetime.

Our digital editions are based on high quality PDFs provided by the BNF’s Gallica online library (1770 and 1780 editions) and the Bodleian’s Oxford Google Books Project (1774 edition). A preliminary search interface has been built using the ARTFL Project’s PhiloLogic software and can be accessed here: Raynal search form. Users can query one or all of the above editions, which represent the first publicly available full-text digital edition(s) of the Histoire des deux Indes. In the coming months we will release a new version of the database running on ARTFL’s state-of-the-art PhiloLogic4 system, along with a preliminary ‘intertextual interface’ that will aim to incorporate the text of the three separate editions into one reading interface.


Title page and frontispiece of the 1780 edition of Raynal’s Histoire des deux Indes (Gallica).

Diderot, Hornoy, and the 1780 edition

What is perhaps most exciting about these new digital resources is the inclusion of a unique 1780 edition of the Histoire des deux Indes recently made available by the BNF. Acquired at public auction in March 2015, this particular edition had been conserved since the late 18th century in the private library of Alexandre Marie Dompierre d’Hornoy (1742-1828). A lawyer at the Parlement de Paris and great-nephew of Voltaire – he in fact inherited Jean-Baptiste Pigalle’s infamous nude statue of Voltaire upon his great-uncle’s death – Hornoy corresponded with many of the philosophes, Diderot included. His copy of the Histoire contains pencil marks in the margins of some passages, an unremarkable fact, perhaps, were it not for a note written by Hornoy just above a three-page insert at the beginning of the first tome. The handwritten tables included in the insert list all the sections marked in pencil over the four volumes of text: ‘mourceaux qui sont de M. Diderot’, Hornoy writes, ‘marqués en crayon par Mme de Vandeul’. Madame de Vandeul was, of course, Diderot’s daughter.


Handwritten insert of the 1780 edition (Gallica)

The existence of such an annotated volume of the Histoire was posited in the 19th century, notably by Joseph Marie Quérard in his Supercheries littéraires dévoilées (5 vols., 1845-1856). Quérard claimed that there supposedly existed a copy of the 1780 edition on which Diderot himself had marked in pencil all the passages that belonged to him [1]. According to Quérard, this copy became the property of Madame de Vandeul shortly after Diderot’s death. Whether or not the copy acquired by the BNF is the same as that owned by Vandeul we cannot say for sure, but Herbert Dieckmann, in his inventory of the ‘fonds Vandeul’, also mentions the hypothetical existence of a copy of the in-4o edition (e.g. 1780) that was purportedly annotated by hand, but that had since been lost [2].

Some preliminary experiments

While consensus as to the validity of Hornoy’s assertion that the marked sections are in fact those authored by Diderot will most likely take years to accrue, we can begin, using the new digital edition, to ask some basic questions as to the authorship claims indicated in the text. Thanks to extensive markup in TEI-XML notation, sections purportedly belonging to Diderot are clearly indicated, and perhaps more importantly, can be extracted as one test corpus. Using some basic statistical measures drawn from authorship attribution studies, or Stylometry, we can begin to think about how the ‘Diderot’ sections may, or may not, differ stylistically – i.e. in terms of comparative word usage over the most common words, an established metric of ‘authorship’ in stylometry and forensic linguistics – from the rest of the text.


Page from 1780 edition with ‘Diderot’ section marked in pencil (Gallica)

Working with the Centre for Literary and Linguistic Computing at the University of Newcastle (Australia), and in particular with their Intelligent Archive software for stylistic and statistical text analysis, we extracted the top 200 words for each ‘author’ (e.g. those drawn from sections putatively by Diderot, and the remaining ‘Raynal’ sections). As a result, we were left with 4 ‘Diderot’ tomes (containing all of the text marked in pencil) and 4 ‘Raynal’ tomes (containing the remainder), representing their unique word lists over the entire edition. For a first preliminary test, we ran a cluster analysis on the 8 tomes to see if they would cluster together or separately:


Cluster analysis of ‘Diderot’ tomes vs. ‘Raynal’ tomes, based on top 200 word lists

Cluster analysis works by separating (or clustering) the most similar texts first and the most distinct last, in this case into 2 branches. A division like the one above, clearly separated into two distinct ‘trees’ is a very clear indication that the texts in each of the two branches are highly likely to be those of two different authors.

Principal component analysis (PCA) provides another method of examining our corpora. PCA is a procedure for identifying a smaller number of uncorrelated variables, called ‘principal components’, from a large set of data. The goal of PCA is to explain the maximum amount of variance with the fewest number of principal components. In our case, it is a technique that allows for the first two principal components of our two sets of texts, i.e. their word variance, to be plotted on a bi-axial or two-dimensional graph. One of these plots (using the 100 most frequent words of the full text) with both text corpora divided into 10,000 word blocks, is shown below.


Principal component analysis using 10,000 word blocks and 100 most frequent words

The disparity in size of our two test corpora meant that while there were 68 text sections for Raynal (in green), there were only 14 for Diderot (in blue). Nonetheless, the separation between the two authorial sets is almost complete, with just two of the Diderot sections located in the outer fringes of the Raynal set. Since the word variables underlying this plot were the 100 most frequent words of the whole text, this is a convincing stylistic division, one that suggests a strong distinction in terms of authorship signal between the two sets.

In order to account for the size discrepancy between the two corpora, we ran another PCA test but this time we increased the number of Diderot sections by segmenting his text into 5,000 word blocks and running these against the previous Raynal 10,000-word sections. This plot is shown below:


Principal component analysis on 5,000 word blocks (Diderot) and Raynal, using 100 most frequent words

Here we see the same sort of authorial/stylistic separation as we saw above, but this time (with the Diderot sections halved in size) the distinction is even stronger, as there is only one section located within the Raynal set of entries, indicating an even greater likelihood that the sections marked in pencil were written by a different author than the rest of the 1780 edition.

These are obviously very rudimentary experiments, but they nonetheless indicate several promising future avenues of exploration. Moving forward, we intend to apply a full suite of computational and stylistic approaches to the 1780 edition and its predecessors, including sequence alignment tools developed by ARTFL, text collation software, and the MEDITE system developed by the labex OBVIL at the Sorbonne for computational genetic criticism. All of these approaches will allow us to explore the textual evolution of the Histoire from 1770 to 1780 in an unprecedented manner, as well as its relationship to other Enlightenment texts and text collections such as Electronic Enlightenment, TOUT Voltaire, and the Encyclopédie.

– Glenn Roe

*I would especially like to thank Alexis Antonia and the Centre for Literary and Linguistic Computing at Newcastle for their generous help with the above stylistic analyses.

[1] See Michèle Duchet, Diderot et l’Histoire des deux Indes ou l’écriture fragmentaire, Paris, Nizet, 1978, p. 22.

[2] Herbert Dieckmann, Inventaire du fonds Vandeul et inédits de Diderot, Genève, Droz, 1951.



Claire Trévien discussed in an earlier post the Candide iPad app which the Voltaire Foundation has produced in association with the Bibliothèque nationale de France and Orange. There have been over 7000 downloads since January, so if you haven’t seen it yet, take a look – it’s beautiful and free!

At the core of the app is René Pomeau’s critical edition of Candide published by the Voltaire Foundation (OCV, volume 48), but lots more has been added. A guiding idea behind the project was to make the text accessible to teenage readers (for example, by supplying a parallel set of annotations aimed specifically at that group), and to judge by the tweeted and blogged responses, it is succeeding. In what is certainly the best (and shortest) review ever given to a VF publication, one French fan has written that the app is “bien foutue”.

But the app is interesting to readers at all stages. You can listen to Candide as well as read it, and the actor Denis Podalydès gives a beautifully clear and cool reading. It’s great to discover the music of Voltaire’s prose: I find that hearing the text read aloud brings out nuances of humour and irony that I’ve missed in silent reading.

Another special feature of the app are the images of the La Vallière manuscript, which dates from 1758, the year before Candide was published. This manuscript has been well known since the 1950s, when it was discovered by Ira Wade, and for this app, the Bibliothèque de l’Arsenal has made new high-resolution images. It is possible to study in a split screen images of the manuscript alongside the subsequent published version of the text, or to look at the manuscript on a full screen and even to enlarge any part of it.

The quality of the images is amazing: as you enlarge them, you can almost feel the secretary Wagnière writing as Voltaire dictated, and you can experience in close-up the moments when Voltaire in his own hand intervenes or corrects his secretary’s draft. In Chapter 1, we remember how Pangloss is introduced, as a teacher of “la métaphysico-théologo-cosmolonigologie”. In the manuscript, we can see how Voltaire first tried “métaphisico-theolo-cosmolo-méologie”, then changed the last word to “mattologie” – here you can actually catch Voltaire in the process of inventing a new word. In Chapter 4, Candide recalls his love for Cunégonde: “il ne m’a jamais valu qu’un baiser et vingt coups de pied au cul”… When you look at the manuscript, you can see how the words “dans le cu” are added, in Voltaire’s own hand, as an afterthought, squeezed into the right-hand margin. Of course all this information is in the apparatus of the VF edition, but no description, however accurate, quite replaces the experience of looking at the original manuscript. Digital images of this quality give us a vivid sense of spying on Voltaire while he is writing.

Nicholas Cronk, Director