Adds two identification numbers to raw data that will be used
for matching parsed data with the original, raw data once parsing
is complete. One (pm.id
) uniquely identifies each observation,
while the other uniquely identifies each distinct street address
(pm.uid
). A variable named pm.type
is also created that
attempts to identify the type of variable of address contained in a
given variable. If pm_identify
is re-run, only pm.type
will be updated.
pm_identify(.data, var, intersect_dict, locale = "us")
.data | A tbl or data frame |
---|---|
var | A character variable containing address data to be parsed |
intersect_dict | A dictionary object with intersection identifiers |
locale | A string indicating the country these data represent; the only
current option is |
A tibble with the pm.id
, pm.uid
, and pm.type
variables added
in the first three positions of the data set.
postmastr
functions are designed to operate
on unique street addresses rather than on an entire data set to increase
speed and performance. The pm.uid
number helps facilitate the
matching process between processed and original data while the observation
identification number pm.id
preserves the original sort order of
the data. These variable names should not be changed - subsequent functions
applied to the prepared data check for their presence before executing
to ensure that they remain, and will error if they are not found.