The Digital Harrisburg ProjectPlacing the Population of a Progressive Era City
Our article describes the work of the Digital Harrisburg Project in placing the population of Pennsylvania's capital city on geocoded historical maps in 1900–1930. We argue that geocoded census data—population tied to precise locations in a GIS—marks a game-changer for creating fine-grained historical pictures of human mobility and changing urban diversity. Because historical federal census tables recorded information about race, immigration, occupation, property value, and home address, the historian has the power to study patterns of residence that relate to complex forces such as regional and global immigration, economic change, and urban reform. Census records and historical maps are hardly unproblematic, however, and require care in analysis and interpretation. The article highlights the digital project and data, the challenges of digitizing demographic records and geocoding urban space, and potential applications for rethinking historical problems such as City Beautiful.
City Beautiful, Harrisburg, spatial history, census, digital history
The "digital turn" in historiography has inaugurated a new era for reconsidering watershed events in Pennsylvania history and the American past more broadly. The rapid digitization of historical photographs and documents, together with new frameworks, platforms, and tools, have made it easier to create and share the stories of local places. The increasing availability of massive demographic datasets such as federal census tables, city directories, and immigration records from Ellis Island have [End Page 22] multiplied evidence for interpreting the past.1 Increasingly intuitive geographic information systems programs have allowed analysts to visualize the changing physical and social fabric of cities and landscapes. The field of digital history, which initially centered on the creation of websites, sharing of sources, and presentation of the past online, also now includes a range of analytical toolsets and approaches for rethinking historical problems.
Many digital history projects have tended to use new tools to create digital correlatives to analog originals in, for example, developing websites devoted to teaching and popularizing a subject; blogging about places or periods; digitizing historical records, maps, and photographs; and creating online archives that curate primary sources. However, as a recent workshop held at George Mason University underscored, fewer projects have devoted digital tools to craft historical arguments and interpretations.2 We argue in this article, and in other pieces in this special issue, that geocoded census data, which associates human populations with precise locations in a GIS, mark a game-changer for creating fine-grained historical pictures of human mobility and the changing diversity of urban communities, and for addressing major historical questions related to the Progressive Era. Because historical federal census tables preserve information about race, immigration, occupation, property value, and (after 1880) home address, the historian has the power to study historical and spatial patterns of residence according to a wide range of social and economic attributes. Viewed over time, geocoded census data can reveal active, growing, and socially mobile populations influenced by broader forces such as regional and global immigration, economic trends, and urban reform.
Our goal in this article is to describe the work of the Digital Harrisburg Project in placing the population of Pennsylvania's capital city on geocoded historical maps in 1900–1930 and its potential for the historical analysis of a Progressive Era city. This article will highlight our work and data, the challenges and problems of digitizing demographic records and geocoding urban space, and some of the potential applications for rethinking historical problems such as City Beautiful. Other chapters of this special issue offer fuller analyses and interpretations from these datasets and describe their applications for public humanities projects as well. We describe in the final section how the reader may download demographic and geospatial datasets for their own analysis. [End Page 23]
the digital harrisburg project
The Digital Harrisburg Project began in January 2014 when three professors—David Pettegrew, Albert Sarvis, and Jeff Erikson—dedicated their classes in Digital History and Geospatial Technology to creating a high-resolution demographic database and map of the population of Harrisburg at the turn of the twentieth century. Professors Sarvis and Erikson had students in GIS classes at Harrisburg University and Messiah College use the popular software ArcGIS to georeference and digitize the historical maps of the 1901 Harrisburg Title Company Atlas and trace the buildings of the city. Students in digital history, for their part, worked on inputting analog scans of federal census records from 1900 in spreadsheets and a database. Although there were different goals for these assignments that were specific to the classes, we hoped that the project would facilitate undergraduate research and new modes of thinking about historical problems and create the possibility for new engagements and collaborations with the broader Harrisburg community.
We directed our attention to understanding Harrisburg's City Beautiful movement, the urban improvement program at the turn of the twentieth century that changed a filthy industrial center into a modern and beautiful capital city with extensive green spaces, miles of newly paved roads, up-todate water filtration systems, and a glimmering capitol building. City Beautiful was a good subject for student research for obvious reasons. It was relevant to current discussions about improving the capital city today, especially the City Beautiful 2.0 movement launched in 2013 that has aimed to embrace the spirit of the original City Beautiful. The Harrisburg improvement campaign, moreover, was historically significant among urban reform programs in its early occurrence, its broad popular support within the city, its popularization of the term "City Beautiful," and its inauguration by a woman.3
Most important, the City Beautiful movement entailed the vote of the majority of the adult male population in favor of a major bond issue in 1902 and resulted in the remaking of an industrial city that affected everyone in the city in one way or another. Reformers redefined space through a new urban plan that converted swamps to parks, carved out places for recreation, strengthened and beautified the riverfront, and revamped buildings and monuments. Many in the city were involved in the planning, constructing, and/or maintaining of buildings, parks, and avenues, while virtually everyone experienced the effects of reform through new parks, changing [End Page 24] property values, and transformation of built environments. In one dramatic case, the state's development of the new Capitol building and surrounding park—direct results of the City Beautiful movement—entailed the demolition of an entire neighborhood between 1913 and 1919. In this respect, we recognized that a demographic and geospatial approach would allow us to consider Harrisburg's improvement campaign through the social networks of reformers, the effects of improvement on the city, and the population who supported or opposed reform but felt its effects all the same.
The collaborative project turned out to be a great success as well as a great deal of work.4 By the late spring of 2014, students had linked federal census data for 24,000 Harrisburgers—half the population of the city in 1900—to a contemporary digitized map in GIS. The project continued in summer and fall through a student-faculty working group, which encoded the rest of the population in 1900, and the following year, as students and faculty refined data, improved matches between demographic datasets and GIS data, and keyed additional federal census years for Harrisburg, neighboring Steelton, and parts of Lancaster. With institutional support from Messiah College and Harrisburg University, and the involvement of students (as work studies, interns, and students in history and technology courses), the project teams have subsequently created and refined datasets from the federal censuses of Harrisburg between 1900 and 1930 and other historical sources (e.g., membership rolls from the Municipal League for Civic Improvement); georeferenced five historical maps of the city (1880, 1886, 1901, 1905, and 1929); digitized and geocoded the maps of 1901 and 1929; added property values and occupation taxes; and digitized historical manuscript groups from local archives. Today, datasets include over 250,000 individual records of Harrisburg's residents for the years 1900, 1910, 1920, and 1930, some of the property values for buildings in 1900, and a growing geospatial dataset representing more than 10,000 individual historical residences from 1901 and 1929. Initial analysis of patterns of immigration, race, and social networks presented at historical conferences pointed to the fruitfulness of spatial approaches for understanding City Beautiful.5
Beyond analysis of population and GIS, one of the important developments of the project was its public capacity and uses that we did not initially predict (cf. Pettegrew and LaGrand, "Harrisburg, the City Beautiful: Recasting the History of Urban Reform in a Small American Capital," this issue). The launching of an interactive map in 2015 attracted local media attention and created opportunities for residents to explore the city as it appeared in cartographic [End Page 25] representation 120 years ago, and also to search for relatives. Dr. Jean Corey led humanities students at Messiah College in hosting poetry workshops centered around the lives of people in the lost Old Eighth Ward. And the launching of the Commonwealth Monument Project, organized by Lenwood Sloan of the IIPT Harrisburg Peace Promenade (a project of the Foundation for Enhancing Communities, fiscal sponsor), drew our energies to public ends of digitized data, maps, and records in 2019–2020, as we explored the Old Eighth Ward and history of African American, Jewish, and immigrant communities in the city.
This collaboration between faculty and students of the humanities and technological fields, as well as community organizations, offers a range of new possibilities for seeing and telling the history of Harrisburg's City Beautiful movement in spatial terms. In the rest of this article, we outline the nature of the demographic and spatial data, and the challenges and potential applications for studying urban progressivism, a theme that other articles in this special issue will develop.
digitizing population: census data
Geocoded population data promise to play a greater role in future digital humanities projects and historical studies of urbanism. In their physical analogue form, the United States federal census records have limited value for the historian interested in questions about the entire social body of a city. An analyst may be able to find information about particular individuals in the records, or even summarize attributes of an entire population of a particular neighborhood or smaller community, as John Bodnar did in his important historical study of immigrant and African American laborers and residents in Steelton, but tabulating an entire city with a population of 50,000 or more is time-consuming and tedious, and counting by a combination of census fields—race and literacy, for example—is practically impossible.6 The digitization of records and use of databases offers a way to tackle complex demographic problems.
Most historians who have made use of census records for demographic analysis (as opposed to genealogical ends) have processed it in aggregate form as a platform for discussing mobility in American social life. The Integrated Public Use Microdata Series (IPUMS) of the Minnesota Population Center, for example, has released comprehensive data of 650 million individual records from the decennial censuses of 1850–1940, a big microdata resource [End Page 26] for longitudinal analysis of American mobility, coded for statistical analysis, and stripped of names or addresses in accordance with their agreement with Ancestry.com.7
Closer to home, Gerald Eggert made significant use of census data three decades ago to study social mobility in Harrisburg's industrial era. At a time when databases were just becoming available for commercial use and computerization was slow, Eggert and his students patiently typed out all the census tables for the city from 1850–70 into the mainframe computer at Penn State University. While these censuses did not record street address—and could not have been used for precise spatial analysis—Eggert discussed the data at both a macro level to aggregate broad demographic patterns related to the effects of industrialization on the population, and as a primary text that revealed specific individual residents and occupations. He showed how industrialization came late to Harrisburg and was not particularly disruptive or explosive as it was in other larger towns (work in craft, for example, continued alongside new manufacturing jobs). Ethnic minorities—Germans, Irish, and African Americans—had different experiences and trajectories in employment and social mobility from 1850 to 1900 based on their backgrounds.8
These studies demonstrate the importance of digitizing census records for understanding American urban society but are not primarily based in spatial analysis. Let us now turn to the datasets of the Digital Harrisburg Project to see how geocoding census data provides new horizons for historical and public uses.
Digitized Censuses for Harrisburg, 1900–1930
The federal census of Harrisburg in June 1900, six months before the conservationist Mira Lloyd Dock delivered her compelling speech for a beautiful city, entailed some forty-two census recorders going door to door collecting information on 50,167 residents in the city, including even the 148 prisoners in the local county jail. The enumerators, who were themselves residents of the districts they recorded, were armed with federal census schedules and a formal set of instructions for classifying the populations. They recorded twenty-eight separate fields for each resident, including name, location, and relationship to the head of the household; personal description such as race, sex, age, marital status, and children; birthplace and family background; citizenship and immigration status; occupation and profession; education; and ownership of property. [End Page 27]
Digitizing this census and subsequent censuses required adopting and following a consistent workflow. Using the scans of census schedules available in Ancestry, we developed spreadsheets that generally paralleled the analog originals (figs. 1 and 2). Students digitized fields that offered the greatest potential for analyzing social and ethnic groups (such as race, age, birthplace, occupation, and property ownership), understanding population distribution across space (ward, enumeration district, street, and address), and connecting data to the enumeration itself (enumeration district and census sheet), but, for better or worse, we decided not to key other fields such as: date of birth, years in United States, naturalization, attended school, owned free or mortgage, farm or house, and number of farm schedule. We employed consistent field names across all spreadsheets (see fig. 2) and a consistent set of rules for entering data. In cases of illegible handwriting, for example, students typed out the letters they could read, indicated illegible characters with an ellipsis of three dots, and highlighted those spreadsheet cells for later checking. In the case of the age field, an infant recorded as three months old had to be converted to a decimal point (.25 of a year) readable by computer. Consistent workflow also extended to nomenclature. Students created one Excel worksheet for each sheet (100 names) of the federal census and labeled every file using a standard name: W01_D43_S11-14_Pettegrew contained census data digitized by Pettegrew for Ward 1, Enumeration District 43, Sheets 11-14. A digital history class completed half the census for 1900 during spring 2014. Work Study students and interns continued the challenging work of the digitization and refinement of other census years (1910, 1920, 1930). They eventually imported all census spreadsheets into a Microsoft Access database.
These digitized censuses contain mistakes and inconsistencies, reflecting the processes of the original enumeration and our own digitization, which we worked to reduce through computer applications such as Open Refine. Most of our efforts have centered on creating reliable ID field values that indicate the exact address of a recorded household and establish the link to geocoded data. An ID value of 1088NINTH, for instance, corresponds to a precise location in Harrisburg, 1088 Ninth Street. Reliable ID data requires disambiguation of street names: North Street is not North Avenue, Calder Street not Calders Farm, while 42 Fourth Street must be correctly named as 42NFOURTH or 42SFOURTH to indicate its location south or north of Market Street. Refining data has required identifying and resolving the wide variation of analog federal census record. [End Page 28]
The databases for the populations of Harrisburg in 1900–1930 provide a resource, however incomplete and flawed, for characterizing the city's population. In one respect, they offer a tool for learning about the population according to one or two basic attributes. A database query can quickly tell us that 15% of Harrisburg households in 1900 were run by females, that [End Page 29] 6.5% of the city's population were boarders scattered among 15% of the households of the city, that a very small percentage of the population (1.5%) had at least one domestic servant, cook, maid, or other hired laborer or assistant living with them, and that very few people—a half a percent of the population—were divorced.
Databases also reveal a good deal about life expectancies. It is perhaps no surprise to find that the average Harrisburger in 1900 was only twenty-eight years old, but queries allow one to define and limit criteria to determine how many people were under the age of forty—three-quarters of the population (74%)—or how many were under the age of 18 (33%), or how many were little children (9% 0–5). One can very easily count the older adults of Harrisburg and calculate that less than 2% of the population was seventy or older, .3% over the age of eighty, and a tiny group (only twelve women, all widows) over the age of ninety. Queries allow one to pull some or all of the information about these individuals to determine that the oldest person in Harrisburg in 1900 was ninety-nine year-old Betsy Lewis, a white woman born in Pennsylvania in 1801, who had five children, none of whom were living, and was then residing at the Home of the Friendless Institute.
Databases are more analytically powerful for describing a population, however, when queries stack multiple demographic attributes. A database allows one to see: that while the average mother in Harrisburg had only 2.4 living children for 3.3 birthed kids, black mothers had experienced more loss on average (56.4% children living) than white mothers (74.6%); that only 96% of foreign immigrants could speak English compared to 99% of native-born Americans; that some immigrant groups had lower literacy rates (contrast literacy rates of the white native-born population (>95%) with immigrants from Poland (37% could read, 27% could write), China (56% read, 67% write), Russia (81% read, 79% write), Austria (82% read, 78% write), Italy (77% read, 75% write), and Ireland (87% read, 83% write); or that only 82% of the African American population could read and 78% could write.
Census databases, in short, give the researcher excellent tools for characterizing the population at the specific moment of their recording and asking questions to find specific answers or follow hypotheses. When successive censuses are compared over time, they provide a view of a changing population within a single urban center. And when datasets are geocoded to digitized urban space, they unlock tremendous analytical potential for visualizing and [End Page 30] exploring the city. The population data also reveals the very human side of the census itself that go beyond the problems of digitization we noted above. Before we turn to the geospatial tools that illuminate demographic patterns in space, it is important to draw attention to the historical dimensions of the federal censuses.
The Contingent Character of Census Data
Numbers and statistics can create the illusion of objective facts, but here we wish to underscore the contingent human elements of demographic information. Scholarship on historical federal censuses over the last generation has shown that past census schedules suffered the same problems of categorization, incompleteness, loss, and bias as recent censuses.9 The most important historical study on the subject, Margo Anderson's The American Census: A Social History (1988, second edition 2015), has demonstrated just how much national politics and the question of slavery (and, later, race) determined the kinds of information collected from human subjects and the reliability of that information on a macro level.10 One need only look at the debates surrounding the citizenship question for the 2020 census to be reminded of the political debates that surround any given census.
Census data also presents problems at the level of its local execution. Most of the enumerators in Harrisburg's censuses were ordinary people with a basic education enlisted to count the populations of their own neighborhoods.11 They made mistakes, misspelled names, and forgot to record fields or recorded incorrectly.12 They met deaf residents, who could not hear the questions being asked, and individuals who had forgotten pertinent details about their family members (as the woman in 1910 who couldn't recall the age of her husband).13 They encountered resistant and suspicious residents who doubted the federal government's promise of privacy, as well as immigrants who could neither understand the questions in English nor the purpose of the census.14 In other cases, they failed to find people at home, whether because residents were out of the city or, as enumerators suspected, because of deliberate evasion.15
Enumerators counted some people twice in the same census: the prominent Neal family involved in iron works manufacturing was recorded in 1900 as living at both 21 North Front and 21 North Third Streets. They also recorded the misinformation given to them. As Eggert showed in his study of the city's African American population for an earlier period, many [End Page 31] black residents lied about their birthplace in 1850 and 1860 at a time when an origin in a slave state was considered a danger.16 Harrisburg residents themselves complained in 1910 about a suspected serious undercount (64,186) compared to the numbers reported in the city directory, noting that Professor J. Howard Wert "demonstrates almost conclusively that the official numbering of the people of Harrisburg is under the mark."17 Policemen and letter carriers, intelligent men, better acquainted with the districts' precinct political workers, contemporaries argued, would have done a better job than the "average citizens" with only a basic education who actually carried out the survey.18
More problematically, enumerators imposed their own categories. Our analysis of the race field in the 1900 census, for example, has shown categories ("Colored" and "Mulatto") that were outside the range of acceptable answers in published instructions that year, reflecting the unique categorical views of individual enumerators (and perhaps a desire to downplay the presence of black residents living in a mostly white neighborhood).19 In the 1900 census, there were no specific instructions about how to categorize occupation or industry, which left it to individual enumerators to make those decisions. A tiny sample of occupation data from the 1900 census of Harrisburg demonstrates the challenges for the modern-day analyst: apprentice (artist), apprentice (baker), apprentice (barber), apprentice (blacksmith), apprentice (book binder), apprentice (compositor), apprentice (electric), apprentice (electrician), apprentice (machines), apprentice (machinist), apprentice (milliner), apprentice (molder), apprentice (molding), apprentice (pattern), apprentice (pattern maker), apprentice (plumbing), apprentice (press work). Such categorical imprecisions and inconsistencies pose difficulties for historical patterning.20
Beyond mistakes and miscounting, the questions raised by the census questionnaires changed over time, as we noted above, complicating a simple comparison. Consider the problem of race or color. Between the 1900 and 1910 censuses, 845 "mulatto" residents suddenly appeared in the city out of nowhere, reflecting new instructions for categorizing population based on approximation of "some proportion or perceptible trace of negro blood."21
In short, the historical censuses are as contingent and problematic as any primary source that the historian uses to make sense of the past. Any detailed analysis of the population and its spatial distribution must recognize at the outset the fragmentary, errant, and incomplete character of the dataset. Be [End Page 32] that as it may, we maintain that the census can provide a good sense of the general patterns at the local level.
mapping the city: geospatial data
Geospatial technologies create the capacity to visualize and contextualize complex data within a map environment. Analyses of spatial patterns such as dispersal and clustering yield insight into the character of a dataset. The spatial context provided by mapping data often reveals relationships between a dataset of interest and existing mapped features. As one well-known historic example, the epidemiologist Dr. John Snow simply mapped cholera deaths in the nineteenth century to reveal their spatial correlation to contaminated water pumps.22 Recent research in historical GIS shows plentiful other examples of the potential of maps in seeing patterns of space in new ways.
When maps are tied to population data, the historian and geographer gains the capacity to visualize the social and economic diversity of urban and rural spheres in spatial terms. Beginning with the 1880 census, enumerators recorded the location of every individual in urban space according to not only ward and precinct, but also physical street address. Because high-resolution, contemporary fire-insurance maps also recorded street addresses for every building in the city, it is possible to associate one dataset with another in GIS and, in doing so, map social, economic, and cultural layers of the city according to their unique property identification number. Locating individual citizens precisely in geocoded maps, however, has its own unique challenges that reflect the variability of census enumerations (outlined above), mapping programs, and the processes of digitization.
The Digital Harrisburg Project sought to map thirty years of individual census records to their place of residence within the rapidly changing physical structure of early twentieth-century Harrisburg. The following discussion highlights the primary activities required for this project goal, the iterative and often disjointed nature of the multiyear project timeline, the new spatial datasets that were developed, and the problems and potential of their use.
mapping activities: base maps and georeferencing
The most critical element of this historical mapping project was establishing a base map to serve as the framework for all subsequent feature placement. This meant, first importing digital scanned maps in raster format such as [End Page 33] TIFF and JPG. The Digital Harrisburg group initially selected its base map after review of various sources, including several series of the well-known Sanborn Fire Insurance Maps.23 Because the history team initially was keying the 1900 federal census, it was important of course to import a map that was closely contemporary with that data. The 1901 Harrisburg Title Company Atlas was in this sense the best source for mapping census data. Compiled from plans, deeds, and surveys, it included details on street names and addresses, property owners, lot widths, and building materials (brick, wood, stone). The Historical Society of Dauphin County generously provided twenty-two high-resolution digital files of the map representing an atlas index page (fig. 3) and twenty-one individual plates (fig. 4) spanning the extent of the city in 1901. Only in subsequent years, as the history team added later census enumerations, did the geospatial team also work to georeference and digitize the 1929 Sanborn maps.
Following the selection of maps, the teams defined their projections (coordinate systems for representing curved space in flattened form), and georeferenced the maps (scaling and aligning them to known geographic points or benchmarks). They accomplished this by using ESRI's ArcGIS software georeferencing tools in combination with scanned historical fire insurance maps. Scanned map images that contained multiple, noncongruent areas of the city on one plate were duplicated and georeferenced separately for each area. Modern aerial photos, authoritative road centerlines,
[End Page 34]
and building footprints served as the basis for alignment. Where buildings and road intersections have remained essentially unchanged, the team used historic road intersections or building corners to create matches. Merging numerous distinct map plates necessarily created spatial distortions relative to modern aerial photographs and GPS ground control points, in part because they were not perfectly mapped in the first place in the early twentieth century.
The maps, once imported, georeferenced, and merged, required clean up, editing, and clipping images to show the mapped area of interest. The maps contained map collar white space, for example, that would obscure surrounding maps when viewed together. A Harrisburg University team used ESRI's Mosaic Dataset footprint tool to accomplish this trimming. By clipping collars, the GIS specialists were able to create a seamless historical map of 1901 Harrisburg by the end of 2014. [End Page 35]
Digitizing: Tracing Building Footprints
Raster image files created from scanning historical maps provided a visual reference but could not, on their own, be used to directly link tabular census data to precise address locations in GIS. An intermediate step was necessary in which teams "digitized" the raster images. This process entailed the creation of shape files containing digital traces (vector polygons) of the buildings on the scanned map images as well as associated attribute tables. Student teams from Harrisburg University and Messiah College created these digitized building footprints first from the raster images of the 1901 Harrisburg Title Company Atlas and later from the 1929 Sanborn maps.
Students carefully reviewed the details of building footprints in these map sources to outline the extent of each structure and determine the correct address to assign to the digitized polygon. Address notations written on the street-facing side of each building provided the address information that was associated with the polygons in attribute tables. Over the course of several years, the Digital Harrisburg teams digitized over 11,000 buildings from the 1901 maps and 23,000 buildings from the georeferenced 1929 Sanborn maps. Figures 5 and 6 show detailed views of this digitization and address allocation of the 1901 and 1929 maps.
Digitizing two distinct maps of Harrisburg, however, created and highlighted another layer of inaccuracies: discrepancies between cartographic representations of the city. In retrospect, it became clear that different maps published by two different companies in 1901 and 1929 would obviously yield differences—some small, some large—that had nothing to do with changes in urban space over the thirty-year period. The same physical building depicted on a 1929 Sanborn map, for example, may have existed in the same form and place a generation earlier, without exactly matching the shape and location recorded in 1901. Indeed, in some cases, the maps showed differences in location between ten and fifteen feet that were more the result of different mapping productions than shifts in the physical structure of individual buildings. The process of scanning aged physical maps created its own distortions, which were exacerbated during the process of georeferencing. These compounding series of distortions produced two distinct spatial datasets, one of Harrisburg in 1901 and another in 1929, which related in complicated and sometimes unclear ways to the actual transformations of the urban center over three decades. To return to the example from above, we wondered whether a visible difference in the location or shape of a building (or, for that matter, the widths of a street) from 1901 to 1929 was the product [End Page 36]
[End Page 37] of refurbishment and rebuilding, mapping programs, or digitization and georeferencing.
These discrepancies encouraged us to find a different solution for comparing the shape of the city in two separate years, one that was not reliant on building footprints alone, and that could be used to associate with the demographic data. Our solution, described in the next section, was to geocode the buildings with centroids (points) using an ArcGIS software tool, "Create Centroid," which finds the geographic center of a polygon feature and places a point there along with all its attributes. The use of points rather than polygons for patterning demographic patterns resolved the tensions of trying to reconcile two geospatial datasets based on historically contingent processes.
Geocoding: Placing Harrisburg's Historical Population
The process of linking the demographic data described earlier in the article with the geospatial data requires a process known as geocoding, which assigns explicit coordinates to digital objects. It is worth noting that locations in GIS can be categorized as either explicit or implicit. The address details of each resident of Harrisburg in the federal censuses are implicit: one may imply that 150 Main Street lies halfway along the 100–200 block of Main Street, but it does not necessarily produce an accurate latitude/longitude location (i.e., an explicit location). Geocoding does that in a couple of different ways, most commonly by interpolating address locations as they align with a range of addresses along either side of a single block of road centerline. Initial attempts to geocode Harrisburg census data in this way did not yield good results. The significant changes to the Harrisburg street network and address ranges, and the imprecise interpolation of address locations along street segments, did not accurately place families from the federal census. For example, certain addresses were clustered together and not evenly distributed across street segments. Some families were misplaced by hundreds of feet.
Ultimately the addresses captured in the building digitization process were joined to the building centroids to produce a map layer of address points coincident with the building centroids. Using ArcGIS's geocoding tools (fig. 7) these discrete point features were used to obtain a one-to-one census record to address location relationship. Addresses were concatenated in a House Identifier attribute field known as the "HID." A HID value corresponded to an exact address: 429STATE, for example, represented a specific location at 429 State Street. The HID value was unique to a specific place on [End Page 38] the map and was linked to the same identifier in the census database. When the two datasets are associated in ArcGIS, in short, the information about the family living at 429 State Street connects with polygons and points of the 1901 and 1929 digitized maps (fig. 8). This process removed the ambiguity of placement along an address range, but it relies on accurate address points to be placed in the center of every building. Work continues to create a comprehensive address point layer that is applicable to every decennial census.
The second half of successful geocoding is the match rate between the spatial reference and the tabular address data. Street names, street prefixes/suffixes, and street numbers in both the tabular data and point features must all match for an address to be located. Data entry errors, spelling errors, omitted street prefixes/suffixes, and other issues with both census records and spatial dataset, have all contributed to match rates below 100%. The match rate of census records to physical locations ranges from as high as 90% in 1900 and 1930 to as low as 84–85% in 1910 and 1920 (see table 1), which means that 5,000–11,000 residents are missing from the maps for the different decades. Efforts continue to isolate and correct these discrepancies to produce the highest match rate possible.
[End Page 39]
While work continues to refine and improve the accuracy and completeness of the historic spatial and tabular data, the datasets produced thus far are a robust source for testing spatial analysis techniques. Over the initial five years of the project the following datasets have been produced: georeferenced historic maps from 1884 (36 plates), 1890 (49 plates), 1894, 1901, 1905 (87 plates), and 1929 (93 plates); building footprints with centroids for geocoding from 1901 (11,072) and 1929 (23,358); initial geocoded census records for 1900, 1910, 1920, and 1930. As final data improvements are [End Page 40] completed, these datasets will be made available for distribution to the wider community. This release is significant because it provides the foundation and framework for future work, a historic spatial information ecosystem on which to base visualization, storytelling, and analysis of Harrisburg, Pennsylvania.
digital harrisburg datasets: availability and potential
We see three avenues for historians, geospatial technologists, and demographers to make use of geocoded census data in the study of City Beautiful and other urban phenomena.
One approach is macro-level analysis of broad spatial patterning of the population in terms of demographic attributes. This kind of analysis sheds light on specific historical problems connected to the distribution of ethnicity, race, immigration, and socioeconomic factors. The chapters that follow by Albert Sarvis, Kostis Kourelis and David Pettegrew, and Sarah Wilson Carter consider how broad residential patterns relate to factors such as ethnic and immigrant communities, property value, displaced neighborhoods, and the spread of disease.
A second approach is micro-level analysis that aims to study the distribution of specific individuals over time. The article by Rachel Williams, for example, records the migration of individual families out of the Old Eighth Ward to new destinations in 1920 and 1930.
Finally, the linking of population to GIS provides valuable opportunities for public engagement and collaboration. Interactive and searchable story maps tied to population and multimedia offer the potential for many public history projects. Several of the articles in a later section on the Old Eighth Ward make use of geocoded census data to this end.
We hope that other researchers who have a little technological background, or know someone who does, may be able to do more with this data than we have ourselves. To that end, we have made available some of the datasets discussed in this article for download, analysis, and tinkering at the Digital Harrisburg website.24 The reader should keep in mind the contingent and incomplete nature of these datasets. The data is not "final" but represents the level of refinement at the time of publication of this article. Further refinements will result in subsequent versions of these datasets. [End Page 41]
david pettegrew is Professor of History and Archaeology at Messiah College. He coordinates and directs the historical projects of the Digital Harrisburg Initiative.
albert sarvis is Assistant Professor of Geospatial Technology and Project Management at Harrisburg University of Science and Technology. He coordinates and directs the geospatial projects of the Digital Harrisburg Initiative.
1. "Passenger Search," The Statue of Liberty – Ellis Island Foundation, https://www.libertyellisfoundation.org/passenger/records.
2. In fall 2017 GMU hosted a workshop focused on the analytical potential of digital history: "Arguing with Digital History: A Workshop on Using Digital History to Make Arguments for Academic Audiences."
3. Dr. George P. Donehoo, Harrisburg: The City Beautiful, Romantic and Historic (Harrisburg: E. J. Stackpole, 1927), 169–92; William H. Wilson, The City Beautiful Movement (Baltimore: Johns Hopkins University Press, 1989), 126–46; Michael Barton, Life by the Moving Road: An Illustrated History of Greater Harrisburg (Woodland Hills, CA: Windsor Publications, 1983), 83–89; Gerald G. Eggert, Harrisburg Industrializes: The Coming of Factories to an American Community (University Park: Pennsylvania State University Press, 1993), 338; Susan Rimby, Mira Lloyd Dock and the Progressive Era Conservation Movement (University Park: Pennsylvania State University Press, 2012), 41–64.
4. Cf. Pettegrew and LaGrand, this issue, for a discussion of the broader context, as well as our acknowledgment of students and faculty who contributed to the work.
5. See the presentations at various history-related conferences: Pennsylvania Historical Association: Rachel Carey, "Demographic Data and the Vote for Beauty" (presentation, annual meeting of the Pennsylvania Historical Association, Harrisburg, PA, October 10, 2015); David K. Pettegrew, "Mapping Harrisburg's Social Diversity and Campaign for Improvement" (presentation, annual meeting of the Pennsylvania Historical Association, Harrisburg, PA, October 10, 2015); Albert Sarvis, "Visualizing Population Mobility in Harrisburg, 1900–1920" (presentation, annual meeting of the Pennsylvania Historical Association, Harrisburg, PA, October 10, 2015). Modern Greek Studies Association: David K. Pettegrew, "Placing the Greek-American Immigrant: Digital and Demographic Approaches to Mapping Migration in the Progressive Era" (presentation, biennial symposium of the Modern Greek Studies Association, Stockton, NJ, November 3, 2017). American Historical Association: David K. Pettegrew, "Mapping the Social Diversity of a Progressive-Era City from 300,000 Names" (presentation, annual meeting of the American Historical Association, Harrisburg, PA, January 5, 2018); Albert Sarvis, "Visualizing the Mobility of Population in Harrisburg, 1900–30" (presentation, annual meeting of the American Historical Association, Washington, DC, January 5, 2018).
6. John Bodnar, Immigration and Industrialization: Ethnicity in an American Mill Town, 1870–1940 (Pittsburgh: University of Pittsburgh Press, 1977).
7. Minnesota Population Center, "IPUMS Complete Data Count, 1790–1940," n.d., IPUMS USA, https://usa.ipums.org/usa/complete_count.shtml; Steven Ruggles, "Big Microdata for Population Research," Demography 51, no. 1 (2014): 287–97. For longitudinal analysis, cf., among many, Joseph P. Ferrie, "History Lessons: The End of American Exceptionalism? Mobility in the United States since 1850," Journal of Economic Perspectives 19, no. 3 (2005): 199–215; Thomas N. Maloney, "Ghettos and Jobs in History: Neighborhood on African American Occupational Status and Mobility in World War I–Era Cincinnati," Social Science History 29, no. 2 (2005): 241–67.
8. Eggert, Harrisburg Industrializes, 209–62.
9. E.g., Robert P. Swierenga, "Historians and the Census: The Historiography of Census Research," Annals of Iowa 50, no. 6 (1990): 650–73; Richard H. Steckel, "The Quality of Census Data for Historical Inquiry: A Research Agenda." Social Science History 15, no. 4 (1991): 579–99; Martha Hodes, "The Mercurial Nature and Abiding Power of Race: A Transnational Family Story," American Historical Review 108, no. 1 (2003): 84–118; David J. Hacker, "New Estimates of Census Coverage in the United States, 1850–1930," Social Science History 37, no. 1 (2013): 71–101.
10. Margo J. Anderson, The American Census: A Social History, 2nd ed. (New Haven: Yale University Press, 2015).
11. "Test for Enumerators," Harrisburg Telegraph, December 27, 1909, 6. In addition to American citizenship and residence with their enumeration districts, requirements included being physically fit, honest, and trustworthy, and having plain and efficient writing. Enumerators had to take a simple test involving filling out a sample census record.
12. Eggert, Harrisburg Industrializes, 379 n. 14, for a tallying problem in the 1850 census.
13. Harrisburg Telegraph, April 27, 1910, 1.
14. Those reluctant to divulge information for privacy concerns, faced potentially hefty fines or imprisonment: "It's costly to Snub Census Enumerators," Harrisburg Telegraph, April 16, 1910, 1. Among the immigrants, the Chinese residents were seen as deliberately evasive, while Russians, Lithuanians, and Poles were fearful of answering questions: "Counting Noses. The Census Enumerators Busy All Over Town," Harrisburg Telegraph, June 2, 1900, 1; "Almost Finished. The Census Taking Is Nearly Completed," Harrisburg Telegraph, June 9, 1900, 1. Cf. Harrisburg Telegraph articles from 1900: June 9 (p. 1), June 14 (p. 1), and June 19. In the case of the city's Chinese, Russian, Lithuanian, Polish, and Italian residents, enumerators found locals (rabbis, residents, foreman, laundrymen) who could interpret.
15. "The Papers Helped. Census Men Endorses Publication of the Questions," Harrisburg Telegraph, June 5, 1900, 1. Cf. Star Independent, November 10, 1910.
16. Gerald Eggert, "'Two Steps Forward, A Step-and-a-Half Back': Harrisburg's African-American Community in the 19th Century," Pennsylvania History 58, no. 1 (1991): 13. "By a margin of two-to-one in 1860 the shifts favored safety, in 1870 they shifted in the direction of candor by a margin of three-to-one."
17. So, an editorial on November 12, 1910, from the Star-Independent discusses an earlier piece by Prof. J. Howard Wert, "Census Takers Handicapped Here. Congested Conditions Hinder the Work in Harrisburg," Star-Independent, November 10, 1910. Thanks to Mr. Calobe Jackson, Jr. for sharing this source with us.
18. Nov. 12, 1910, Harrisburg Daily Independent, November 12, 1910, 6.
19. For the questionnaires of historical censuses see https://www.census.gov/history/www/through_the_decades/.
20. Eggert, Harrisburg Industrializes, 152–54, for similar observations for the 1850–70 censuses. In 1910 the addition of an "Industry" field helped to disambiguate occupation from industry.
21. See page 28 of the 1910 instructions guide: https://www.census.gov/history/pdf/1910instructions.pdf.
22. John Snow, On the Mode of Communication of Cholera (London: John Churchill, 1855).
23. Sanborn Company maps were obtained from https://libraries.psu.edu/about/collections/sanborn-fire-insurance-maps