Adds two identification numbers to raw data that will be used for matching parsed data with the original, raw data once parsing is complete. One (pm.id) uniquely identifies each observation, while the other uniquely identifies each distinct street address (pm.uid). A variable named pm.type is also created that attempts to identify the type of variable of address contained in a given variable. If pm_identify is re-run, only pm.type will be updated.

pm_identify(.data, var, intersect_dict, locale = "us")

Arguments

.data

A tbl or data frame

var

A character variable containing address data to be parsed

intersect_dict

A dictionary object with intersection identifiers

locale

A string indicating the country these data represent; the only current option is "us" but this is included to facilitate future expansion.

Value

A tibble with the pm.id, pm.uid, and pm.type variables added in the first three positions of the data set.

Details

postmastr functions are designed to operate on unique street addresses rather than on an entire data set to increase speed and performance. The pm.uid number helps facilitate the matching process between processed and original data while the observation identification number pm.id preserves the original sort order of the data. These variable names should not be changed - subsequent functions applied to the prepared data check for their presence before executing to ensure that they remain, and will error if they are not found.