Convert a .txt dataset into various kinds of spreadsheets for data visualization.

Documentation for Using Spreadsheets

Each of the three .txt datasets that we’re making available for every journal on the MJP Lab site contains many strings of data about the journal. Each string in a dataset consists of the same kind of information in the same order (e.g., contributor info first, title info second, genre info third, etc.), and the very first string in the dataset, at the top, provides a label or header for each of the columns in the data strings that follow. Within every string in the dataset, information for each column/field is separated from the fields before and after it with a vertical bar or “pipe” symbol (|): e.g., contributor info|title info|genre info.

Configured this way, each .txt dataset is ready to be converted into a tab-delimited spreadsheet. That’s easy to do; but if you’re new to this, here are some instructions for importing the data into Excel (though the procedure may vary depending on which edition of Excel you’re using and whether you’re working with a PC or Mac):

  • After you download the dataset from the MJP Lab repository on Sourceforge, you’ll want to open the spreadsheet software on your computer and import the dataset into a blank or new spreadsheet. If you’re using Microsoft Excel on a Mac, open a New Workbook in the File menu option and then, in File, select Import (or Open on a PC).
  • In Excel, an “Import” page will now appear and ask you to identify the type of file you want to import. Select “Text file.” (On a PC, replace the default “All Excel Files” from the bototm-right menu with “All Files.”)
  • When you do so, the “Choose a File” page will open, which allows you to navigate to the dataset on your computer. Do that and then select “Get Data.” (On a PC, simply use the Open page to navigate to the text file.)
  • Next, in Excel, the first of three pages from the “Text Import Wizard” will appear. This first page will ask you to choose between “Delimited” and “Fixed width” data; choose “Delimited.” And in the “File origin” box, navigate to “65001: Unicode (UTF-8)” if any other setting comes up as the default; this will allow diacritical marks and other formatting to appear correctly in the spreadsheet. Then press “Next.”
  • On page 2 of the “Text Import Wizard,” you are asked to set the delimiters within your data. Select “Other” and type a pipe symbol in the empty box to its right. (On most computers the pipe key, which needs to be pressed along with the shift key, appears just above the return key on the far right-hand side of the keyboard.) Once you do this, you’ll also be able to preview what the data looks like in the lower half of the page; check to see that it looks properly organized. Before you move on, you might also set the “Text qualifier” option to “none.” Then press “Next” to move to the third page.
  • Page three of the Text Import Wizard allows you to select a specific Data Format for each of your text columns. The default setting is “General,” but with this setting Excel tends to read the page numbers in the file as dates, and it also formats dates in a most unuseful way. So you would do well now to select each column (first “pages,” then “date”) and select “Text” instead of “General.” Then press Finish.
  • In a PC, the data from the dataset should now appear in your spreadsheet workbook, with the labels of each column appearing in the top line. On a Mac, an “Import Data” page will pop up; simply select “OK” to have the data populate a spreadsheet workbook.
  • When saving the file on a PC, you may have to replce the default “Tab Delimited” with “Excel Workbook.”

How to manipulate a spreadsheet

Adjsuting column widths: On a PC, you may have to manually expand the width of each column to read the full header and see its full contents. You can do this by going to the row at the top of the spreadsheet that contains letters for each column (A, B, C, etc.) and then double-clicking on the border between columns; this should cause the preceding column to snap to the width of its longest content.

You can also use the Sort and Filter command to organize the contents of any column by alphabetical or numeral order.

How to create a pivot table

One of the most useful ways to manipulate the spreadsheet is to create a pivot table. By allowing you to isolate and relate a couple of categories, a pivot table will let you create (for instance) alphabetized lists of unique contributors or calculate the number of items for each genre. Here’s how:

  • Open the spreadsheet in Excel, and select two or more columns: one column should represent the data you want to know more about, and the other column should contain info about every row in the first column. If the columns are separated in the spreadsheet by other columns, you can hide those columns or simply select all of them, since you’ll be able to select just two at a later step.
  • Then, on a Mac, select Data / Pivot Table Report from the Excel menu to open the Pivot Table Wizard. For the first page, just press “Next” since your spreadsheet is a “Microsoft Excel list or database,” the default setting, and also press “Next” on Step 2.
  • Then on Step 3, press Layout. The Layout table should appear, with your two columns represented on the far right if you highlighted just two. (If you highlighted more than two, you’ll have to select the columns you want to compare from the multiple categories listed on the right.) Drag into the “Row” box the column of data you’re interesting in, and then drag into the “Data” box the other category that provides a baseline for it. For instance, if you want to produce an alphabetized list of unique contributors, you will drag “contributor” into the “Row” box, and into the “Data” box the other column—perhaps “title,” since every contributor contributed a title (any category will do here that doesn’t include blanks for any contributors). Then press OK—and when Step 3 appears again, press “Finish.”
  • A new sheet/tab will now appear on the spreadsheet, which indicates, in the first column, every contributor to the journal, listed alphabetically, and in the second column to the right, how many contributions each contributor made.
  • The procedure is quite different (and more streamlined) on a PC. After selecting two or more columns, select the “Insert” tab and then select “PivotTable” from the far left menu.
  • A Create PivotTable Wizard will now open, which should list the range of data (number of columns) you previously selected. Press OK.
  • A new worksheet for the pivot table will now appear, with your selected column categoies appearing in the PivotTable Field list at the far right. In this panel, check now the boxes for the two columns you want to work with. When you do, the categories will appear in the Row Labels box below. You’ll want to leave here the category you want to know more about, and drag to the Values box, to the right, the other category. Thus, to produce an alphabetized list of contributors that calculates the number of their contributions (or titles), leave the “creator” category in the Row Labels box and move the “title” category to the Values box, which will cause the number of titles for each contributor to be counted.
  • The pivot table, with the information you desire, should now appear immediately to the left. If you like, you can now close the pivot table panel on the right.
Back to top

Back to Top