Use the metadata to build maps, info graphics, or interactive resources.

Documentation about the MODS-based .txt datasets

In addition to the MODS catalogue records and TEI text transcripts for each journal,
we’re now making available three .txt-file datasets for each journal we post on the MJP
Lab’s Sourceforge repository
. These datasets are derived from the MODS files, aggregating most of the data
recorded there; they thus give users the MJP’s catalogue information about each journal
without the hassle of having to extract, concatenate, and organize the data
themselves.

Why there are three datasets—and what appears in each one

We’re making three .txt datasets available for each journal that appears on the
Sourceforge repository, since each dataset configures the MODS data in a different way
and allows for different kinds of analyses. Here, for instance, are the three datasets
we’ve uploaded for Poetry Magazine:

  • Poetry_1.journaloverview.txt
  • Poetry_2.everytitle.txt
  • Poetry_3.everycontributor.txt

The first dataset above offers a quick overview of Poetry by listing issue-level
information only, while the other two datasets offer more detailed views of the journal
by including item-level information along with the issue-level data from the first
dataset. Though the last two datasets contain much of the same data, the second one
offers an exhaustive record of every title in Poetry, while the third provides an
exhaustive record of every contributor to the magazine.

Dataset 1: Journal overview

The “journal overview” dataset contains only general information about each issue of a
magazine, drawn from the top section of the MODS file for every issue. By excluding all
data about the contents of individual issues (e.g., articles, letters, pictures),
this is by far the smallest of the three datasets and affords users a quick overview of
the journal as a whole. The “journal overview” dataset provides information (if recorded
in the MODS files) for each issue in the following twelve categories:

  • journal title: This is the primary title of the journal: e.g., “Egoist,”
    “Little Review.” Please note that we’ve omitted the preceding article (e.g., The), if
    it exists, from the main title of the journal, in order to facilitate meaningful
    sorting—so “The Egoist” simply appears as “Egoist”. The journal title information
    corresponds to the content of the mods:title element in the MODS file at this xpath:
    //mods:mods/mods:titleInfo/mods:title
  • journal subtitle: This is the secondary title that may appear on the cover or
    contents page of a magazine: e.g., “An Individualist Review” for some issues of
    The Egoist. This info corresponds to the content of the mods:subTitle element in the
    MODS file at this xpath: //mods:mods/mods:titleInfo/mods:subTitle
  • issue name: In addition to a subtitle, some magazine issues carry a specific name
    (usually on the cover) that conveys information about the issue’s special theme or
    contents: e.g., issue 2.5 of The Egoist is designated the “Special Imagist Number,”
    while issue 5.4 of The Little Review is the “Henry James Number.” This info
    corresponds to the content of the mods:partName element in the MODS file at this
    xapth: //mods:mods/mods:titleInfo/mods:partName
  • volume: This is the number of the volume that an issue belongs to: e.g.,
    Freewoman 2.28, Poetry 21.2. We’ve extracted this info from the
    mods:partNumber element in the MODS file at this xpath:
    //mods:mods/mods:titleInfo/mods:partNumber
  • issue: This is the number of the issue within a certain volume: e.g.,
    Freewoman 2.28, Poetry 21.2. We’ve also extracted this info from the
    mods:partNumber element in the MODS file at this xpath:
    //mods:mods/mods:titleInfo/mods:partNumber
  • date: This is the issue’s date of publication, expressed as an eight-digit
    number: YEAR-MO-DY. The date of the February 8, 1912 issue of the Freewoman thus
    appears this way in the dataset: 1912-02-08. We’ve extracted this info from the
    mods:dateIssued element in the MODS file at this xpath:
    //mods:mods/mods:originInfo/mods:dateIssued[@keyDate=”yes”]; and when we processed the
    dataset, we converted the 6-digit dates of monthlies and quarterlies, as well as the
    4-digit dates of annuals, to this 8-digit expression, in order to enable comparisons
    of all MJP journals by their publication dates.
  • journal editor: This is the primary editor of the journal, listed by “last
    name, first name”: e.g., “Monroe, Harriet” of Poetry. If a journal has two main
    editors, both are listed here, in the order of their appearance in the MODS file:
    e.g., “Marsden, Dora; Gawthorpe, Mary” of The Freewoman. This info corresponds to the
    content of the mods:namePart element in the MODS file at this xpath:
    //mods:mods/mods:name/mods:namePart
  • publisher: The journal’s publisher can be either a company (Stephen Swift and
    Co. Ltd.) or an individual (Harriet Monroe); in both cases, the publisher’s name
    is written exactly as it appears in the publication (for persons, that’s generally: first name last
    name). This info corresponds to the content of the mods:publisher element in the MODS
    file at this xpath: //mods:mods/mods:originInfo/mods:publisher
  • journal location: This is where the magazine was published—usually a major
    city: e.g., London, New York, Chicago. This corresponds to the content of the
    mods:placeTerm element at this xpath in the MODS file:
    //mods:mods/mods:originInfo/mods:place/mods:placeTerm[@type=”text”]
  • issue length (pp): This is the total number of pages in any issue of the
    magazine (rather than merely the issue’s numbered pages), including both front and
    back covers and all advertising pages. We’ve extracted this info from the mods:extent
    element in the MODS file at this xpath:
    //mods:mods/mods:physicalDescription/mods:extent
  • issue height (cm): This is the physical height of the issue (measured in
    centimeters). We’ve also extracted this info from the mods:extent element in the MODS
    file at this xpath: //mods:mods/mods:physicalDescription/mods:extent
  • issue width (cm): This is the physical width of the issue (measured in
    centimeters). We’ve also extracted this info from the mods:extent element in the MODS
    file at this xpath: //mods:mods/mods:physicalDescription/mods:extent

Datasets 2 and 3: item-level data for “every title” and “every contributor”

The two remaining .txt datasets contain all of the information about a magazine that
appears in the twelve data fields above, but they additionally include information about
the individual items within the journal. Each dataset is moreover configured so every
item listed in it includes (in the same row or string) all of the information from the first
dataset about the issue in which it was published. Thus, both of the item-level datasets
for Poetry record Ezra Pound’s “In a Station of the Metro” as being a poem that appears
on page 12 of issue 2.1 of the magazine, but they also associate the poem with that
issue’s date, its physical size, its overall length, who edited and published it, where
the issue was published, and whether the issue carried a subtitle and issue name.

While the info in the two item-level datasets is largely the same, the second “every
title” dataset offers the most complete and accurate account of every item published in
the magazine, while the third “every contributor” dataset gives the most complete and
accurate account of every contributor to the journal. We felt we needed to create these
two versions of the item-level data since there often isn’t a one-to-one correlation
between contributors and items/titles in a magazine: some items (like ads, contents
pages, etc.) lack authors, others have more than one author, while still others have an
editor and/or translator(s) in addition to their author(s).

The “every contributor” dataset, therefore, lists together, in a single
“contributor” data field, every author, artist, editor, and translator who made a
contribution to the magazine, giving one mention per contribution. This means that items
created by multiple contributors are mentioned multiple times in this dataset, while items not
associated with a contributor are not represented at all.

The “every title” dataset, by contrast, mentions every item/title in the
magazine just once. If an item has multiple authors, they will be listed together, in a
single cell for that item, in the “creator” list (which means that you will only be able
to sort by the first author listed, which corresponds to the first author who appears in
the MODS record for that item). This dataset, however, places the “creators” (authors
and artists), “editors,” and “translators” of items in three separate data fields,
which makes this dataset additionally useful for discerning these different contributions to
the magazine.

What follows is a detailed account of the seven data fields in these two datasets that
don’t also appear in the “journal overview” dataset:

  • creator (in the “every title” dataset only): This field includes any author
    or artist who has created an item that appears in the magazine. The info listed here
    corresponds to the contents of the mods:namePart element when the xpath in the MODS
    file is //mods:mods/mods:relatedItem/mods:name/mods:namePart and the content of the
    associated mods:roleTerm is “creator”.
  • editor (in the “every title” dataset only): Not to be confused with the
    “journal editor” or any other editor on the staff of the magazine, the “editor” field
    refers only to those contributors to the magazine who are identified as having edited
    a particular item within it. This happens rarely, so it’s possible that the entire
    “editor” column in the dataset for a journal will appear empty. The info listed here
    corresponds to the contents of the mods:namePart element when the xpath in the MODS
    file is //mods:mods/mods:relatedItem/mods:name/mods:namePart and the content of the
    associated mods:roleTerm is “editor”.
  • translator (in the “every title” dataset only): The info listed here
    corresponds to the contents of the mods:namePart element when the xpath in the MODS
    file is //mods:mods/mods:relatedItem/mods:name/mods:namePart and the content of the
    associated mods:roleTerm is “translator”. When there are two or more translators for
    an item, they will appear together, in the single “translator” cell for that item, in
    the order of their appearance in the MODS record: e.g., Florence Ayscough and Amy
    Lowell, the two translators of Wen Cheng-ming’s “Chinese Written Wall Pictures” in
    Poetry 13.5, appear in the dataset as “Ayscough, Florence; Lowell, Amy”.
  • contributor (in the “every contributor” dataset only): This field combines,
    in a single field/column, all “creators,” “editors,” and “translators” of items in a
    magazine. As we mentioned above, authors, artists, editors, and translators are listed
    in this field every time they contribute (or participate in contributing) an item to
    the magazine. The info in this feild corresponds to the contents of the mods:namePart
    element at this xpath in the MODS file:
    //mods:mods/mods:relatedItem/mods:name/mods:namePart
  • title (in both item-level datasets): The “title” field records the names of
    individual items published in the magazine. This information appears in the MODS file
    within the mods:titleInfo element for an item at the following xpath:
    //mods:mods/mods:relatedItem/mods:titleInfo. The title info may include the contents of the
    following five elements: 1. mods:nonSort: an initial article that precedes the main
    title; 2. mods:title: the item’s primary title; 3. mods:subTitle: any secondary title
    for the item; 4. partNumber: the number assigned to the item if it’s a part of a
    larger series; and 5. partName: the name assigned to the item if it’s part of a
    larger series. When combined within the “title” field for an item, these five elements
    will always be arranged in the following order with the following punctuation: article
    + title: subTitle—partNumber: partName. For instance: The Reader Critic: ‘Spiritual
    Adventures’; or, Songs and Sketches—I: Night.
  • genre (in both item-level datasets): When the MJP staff create the MODS
    records for a magazine, they assign a genre to each magazine item they catalogue. The
    field of genres available to the cataloguer is currently limited to the following
    seven kinds of texts: advertisements, articles, drama, fiction, images, letters, and
    poetry; accordingly, each item in the item datasets will be associated with one of
    these seven genres. The info in the genre field corresponds to the content of the
    mods:genre element in the MODS file at this xpath:
    //mods:mods/mods:relatedItem/mods:genre
  • pages (in both item-level datasets): This field indicates the location of the
    item within the magazine. The page location is always expressed as a span of pages
    (e.g., 10-25), even if the item appears on only a single page (e.g., 8-8). If the item
    is interrupted within the magazine, multiple page spans will be listed here: e.g., 6-9
    14-15. These page numbers reflect the pagination that appears in the published
    magazine (along with the MJP cataloguer’s pagination of any unnumbered pages), rather
    than the absolute count of pages (from cover to cover) that appears in the page images
    view of the magazine on the MJP website. This page information corresponds with the
    content of the mods:start and mods:end elements at this xpath in the MODS file:
    //mods:mods/mods:relatedItem/mods:part/mods:extent

Additional Features of the Datasets

How the datasets are organized

Each journal dataset pulls together the information from all of the MODS files that the
MJP has created for that journal, and it arranges the info by order of publication date:
so the informaiton for the earliest issue of that magazine will appear at the top of the
dataset and the info for the last issue will appear at the bottom. The info for each issue in the dataset will in turn be ordered exactly as it appears within the source MODS
file: by its order of appearance, page by page, within the covers of the magazine.

When the MJP staff catalogue the contents of a magazine, they occasionally decide to embed some items in the MODS file within other items; this is generally done for those items that contain many small parts that we’d rather not show up in the contents pages for the magazine (but do want to show up in searches for the journal)—like the incidental illustrations that accompany some texts. In creating the item datasets from the MODS files, we decided to include all items, top-level and embedded both, which now will appear on the same level. In the case of poem series, this may cause some redundancy in terms of page counts, since the same item may appear twice: first as the one encompassing title, and then as the sum of its component parts.

Things we’ve left out of the datasets: Besides the article that precedes a journal’s title (see above), we’ve decided to exclude from the datasets any notes or tablesOfContents appended to the MODS record for an item, since these are often discursive accounts of mostly incidental matters, made at the discretion of the cataloguer, that don’t add much to the information about an item.

More about Names

Except in the case of publishers, the names of people in the various data fields (contributor, creator, editor, translator, journal_editor) will appear “last name, first name”—e.g., Rodker, John; Davies, Mary Carolyn; Eliot, T. S.—which faciliates meaningful sorting by last name. We also are including, in parentheses after the person’s name, any term of address that has been recorded for an individual in the MODS file: e.g., Wilson, Edmund (Jr.); Van Rennselaer, Schuyler (Mrs.); von Freytag-Loringhoven, Else (Baroness). Because the datasets record names, not persons, the same person may show up in these datasets under several name variants, abbreviations, or aliases. (The MJP policy is to record the name as it appears in the magazine, so any variation or mistake in the spelling of a person’s name in the magazines will also show up in our data about it.)

Back to top

Back to Top