A wrapper around the parse functions that can be used to shorten all
of postmastr
's core code down to a single function call once dictionaries
have been created and tested against the data.
pm_parse(.data, input, address, output, new_address, ordinal = TRUE, operator = "at", unnest = FALSE, include_commas = FALSE, include_units = TRUE, keep_parsed = "no", side = "right", left_vars, keep_ids = FALSE, houseSuf_dict, dir_dict, street_dict, suffix_dict, unit_dict, city_dict, state_dict, locale = "us")
.data | A source data set to be parsed |
---|---|
input | Describes the format of the source address. One of either |
address | A character variable containing address data to be parsed |
output | Describes the format of the output address. One of either |
new_address | Name of new variable to store rebuilt address in. |
ordinal | A logical scalar; if |
operator | A character scalar to be used as the intersection operator (between the 'x' and 'y' sides of the intersection). |
unnest | A logical scalar; if |
include_commas | A logical scalar; if |
include_units | A logical scalar; if |
keep_parsed | Character string; if |
side | One of either |
left_vars | A character scalar or vector of variables to place on the left-hand side of
the output when |
keep_ids | Logical scalar; if |
houseSuf_dict | Optional; name of house suffix dictionary object. Standardizationl and parsing are skipped if none is specified. |
dir_dict | Optional; name of directional dictionary object. If none is specified, the full default directional dictionary will be used. |
street_dict | Optional; name of street dictionary object. Standardizationl is skipped if none is specified. |
suffix_dict | Optional; name of street suffix dictionary object. If none is specified, the full default street suffix dictionary will be used. |
unit_dict | Optional; name of unit dictionary object - NOT CURRENTLY ENABLED |
city_dict | Required for |
state_dict | Optional; name of state dictionary object. If none is specified, the full default state dictionary will be used. |
locale | A string indicating the country these data represent; the only current option is "us" but this is included to facilitate future expansion. |
An updated version of the source data with, at a minimum, a new variable containing standardized street addresses for each observation. Options allow for columns containing parsed elements to be returned as well.
# construct dictionaries dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us") sufs <- pm_dictionary(type = "suffix", locale = "us") mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us") cities <- pm_append(type = "city", input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", "St. Louis", "SAINT LOUIS", "Webster Groves"), output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA)) # add example data df <- sushi1 # identify df <- pm_identify(df, var = address) # temporary code to subset unit df <- dplyr::filter(df, name != "Drunken Fish - Ballpark Village") # parse, full output pm_parse(df, input = "full", address = address, output = "full", keep_parsed = "no", dir_dict = dirs, suffix_dict = sufs, city_dict = cities, state_dict = mo)#> # A tibble: 27 x 4 #> name address visit pm.address #> <chr> <chr> <chr> <chr> #> 1 BaiKu Sushi Loun… 3407 Olive St, St. Louis… 3/20/18 3407 Olive St St. Louis … #> 2 Blue Ocean Resta… 6335 Delmar Blvd, St. Lo… 10/26/… 6335 Delmar Blvd St. Lou… #> 3 Cafe Mochi 3221 S Grand Boulevard, … 10/10/… 3221 S Grand Blvd St. Lo… #> 4 Drunken Fish - C… 1 Maryland Plaza, St. Lo… 12/2/18 1 Maryland Plz St. Louis… #> 5 I Love Mr Sushi 9443 Olive Blvd, St. Lou… 1/1/18 9443 Olive Blvd St. Loui… #> 6 Kampai Sushi Bar 4949 W Pine Blvd, St. Lo… 2/13/18 4949 W Pine Blvd St. Lou… #> 7 Midtown Sushi & … 3674 Forest Park Ave, St… 3/4/18 3674 Forest Park Ave St.… #> 8 Mizu Sushi Bar 1013 Washington Avenue, … 9/12/18 1013 Washington Ave St. … #> 9 Robata Maplewood 7260 Manchester Road, Ma… 11/1/18 7260 Manchester Rd Maple… #> 10 SanSai Japanese … 1803 Maplewood Commons D… 2/14/18 1803 Maplewood Commons D… #> # … with 17 more rows# parse, short output pm_parse(df, input = "full", address = address, output = "short", keep_parsed = "no", new_address = clean_address, dir_dict = dirs, suffix_dict = sufs, city_dict = cities, state_dict = mo)#> # A tibble: 27 x 4 #> name address visit clean_address #> <chr> <chr> <chr> <chr> #> 1 BaiKu Sushi Lounge 3407 Olive St, St. Louis, Mi… 3/20/… 3407 Olive St #> 2 Blue Ocean Restaura… 6335 Delmar Blvd, St. Louis,… 10/26… 6335 Delmar Blvd #> 3 Cafe Mochi 3221 S Grand Boulevard, St. … 10/10… 3221 S Grand Blvd #> 4 Drunken Fish - Cent… 1 Maryland Plaza, St. Louis,… 12/2/… 1 Maryland Plz #> 5 I Love Mr Sushi 9443 Olive Blvd, St. Louis, … 1/1/18 9443 Olive Blvd #> 6 Kampai Sushi Bar 4949 W Pine Blvd, St. Louis,… 2/13/… 4949 W Pine Blvd #> 7 Midtown Sushi & Ram… 3674 Forest Park Ave, St. Lo… 3/4/18 3674 Forest Park A… #> 8 Mizu Sushi Bar 1013 Washington Avenue, St. … 9/12/… 1013 Washington Ave #> 9 Robata Maplewood 7260 Manchester Road, Maplew… 11/1/… 7260 Manchester Rd #> 10 SanSai Japanese Gri… 1803 Maplewood Commons Dr, S… 2/14/… 1803 Maplewood Com… #> # … with 17 more rows