A table in human readable format Part III

Last two times I showed you how easily you can extract information from EPPO SQLite database, put it into easily understandable table and how to make use of EPPO REST API. Below I show you I will show you how to download files straight into list and use to expand table. Lets prepare our workflow by defining some variables e.g. EPPOcodes.

library('dplyr'); library('tidyr'); library('DBI'); library('RSQLite');
library('magrittr'); library('kableExtra'); library('knitr'); library('RCurl');
library('jsonlite')

# define pest list and connect to SQL database
pestTest <- c('Xylella fastidiosa',
             'Emerald ash borer',
             'Meloidogyne fallax',
             'Cercospora angolensis',
             'cadang-cadang')
sqlEppo <- dbConnect(RSQLite::SQLite(),
                     dbname = 'eppocodes.sqlite')


# match names from list with sql DB
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, fullname, codelang, preferred 
                    FROM t_names WHERE fullname LIKE ',
                    paste(paste0("'%", pestTest, "%'"),
                    collapse = " OR fullname LIKE "))) -> pestInDB

codeIdDB <- unique(pestInDB$codeid)
 
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, fullname, preferred, codelang, status 
                    FROM t_names WHERE codeid IN (',
                    paste(codeIdDB, collapse = ', '), ')')) %>%
  filter(status == 'A') -> pestCorNames

#query sql for codeid and eppocode
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, eppocode FROM t_codes WHERE codeid IN (',
                    paste0(codeIdDB, collapse = ', '),')')) %>% 
  arrange(eppocode) -> EPPOcodes

#variables for distribution
distriFiles <- paste0('distribution/',EPPOcodes[,2],'dFiles')
distriLists <- paste0(EPPOcodes[,2],'distri')
distriUrls <- paste0('https://gd.eppo.int/taxon/',
                     EPPOcodes[,2],'/download/distribution_csv')

First we define empty list, and we create distriTable which combines EPPO code, code id and names of our distribution variables. Then using simple loop, just use read.csv() with links as an argument to download files straight into list. Finally we use names we created before for our distribution lists to name list elements (with the same names as in distriTable).

#define empty list
distriList <- vector(mode='list', length(distriUrls))
EPPOcodes %>% mutate(distriLists = distriLists) -> distriTable
#download and distribution files to list
for (i in 1:length(distriList)) {
    distriList[i][[1]] <- read.csv(file = distriUrls[i],
                                   header = T, stringsAsFactors = F)
}
names(distriList) <- distriLists

Now, all the information we gathered from EPPO csv files on pest distribution are in one list. Using bind_rows() from dplyr package we merge them into one data frame. You notice that we use piping operator %$% instead of %>%. That is because %>% is used when function on the right side does have built-in data argument, otherwise you can use %$%. In few next steps of pipeline we make some cleaning – like getting rid of records that have absent Status or changing long country names into something shorter. Eventaully we select only columns that will be used in next steps and nest table by country and continent.

bind_rows(distriList, .id = 'distriLists') %$%
  .[c(grep('Absent', Status, invert = T)),] %>%
  mutate(country = recode(country, 
                          'United States of America'     = 'USA'),
         country = recode(country, 
                          'United Kingdom'               = 'UK'),
         country = recode(country, 
                          "Korea Dem. People's Republic" = ' North Korea')) %>%
  select(distriLists, continent, country) %>%
  unique() %>%
  nest(continent, country) -> nestDistro

So, now are base (we have a data frame of distributions for each pest) is nearly ready. We need to make further adjustments and cleaning - to have everything in one data frame with all the distribution records for each species in single cell. We also want our records in some order i.e. name of the continent, countries, name of next continent, and so on. Below you find a for loop, that goes through every data frame in our nested variable, group by continents, merge country names, once again merge with continent names, and finally create one cell containing all the names for each pest. If you are lost in some point, try to execute below code chunk, piece by piece to find out what is going on.

Now lets join our distriTable with nested table in which we mutate() data column with unlisted result from previous code chunk.

kable(right_join(distriTable, nestDistro %>%
                                mutate(data = unlist(distroIndex))),
      'html') %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
## Joining, by = "distriLists"
codeid eppocode distriLists data
66001 AGRLPL AGRLPLdistri America: Canada; USA/ Asia: China; Japan; North Korea; Korea, Republic; Mongolia; Taiwan/ Europe: Russia
64718 CCCVD0 CCCVD0distri Asia: Philippines
410 CERCAN CERCANdistri Africa: Angola; Burundi; Cameroon; Central African Republic; Comoros; Congo; Congo, Democratic republic of the; Cote d’Ivoire; Ethiopia; Gabon; Gambia; Ghana; Guinea; Kenya; Mozambique; Nigeria; Rwanda; Sierra Leone; Tanzania; Togo; Uganda; Zambia; Zimbabwe/ Asia: Yemen
63576 MELGFA MELGFAdistri Europe: Belgium; France; Germany; Netherlands; Switzerland; UK/ Oceania: Australia; New Zealand
3243 XYLEFA XYLEFAdistri America: Argentina; Brazil; Canada; Costa Rica; Mexico; Paraguay; Puerto Rico; USA; Venezuela/ Asia: Iran; Taiwan/ Europe: France; Italy; Spain
92965 XYLEFF XYLEFFdistri America: Costa Rica; Mexico; USA/ Asia: Taiwan/ Europe: Spain
70536 XYLEFM XYLEFMdistri America: Argentina; Brazil; Paraguay; USA/ Europe: France; Spain
70537 XYLEFP XYLEFPdistri America: Argentina; Brazil; Costa Rica; Ecuador/ Europe: Italy; Spain
70538 XYLEFS XYLEFSdistri America: USA

OK. As we managed to make the last part of our Table in human readable format we are able to show final results. So lets put all the code altogether! THIS CODE NEEDS A LOT OF REFACTORING AND SIMPLIFYING

## Joining, by = c("catLists", "eppocode")
## Joining, by = c("codeid", "eppocode")
## Joining, by = c("codeid", "eppocode")
Preferred_name Alternative_names eppocode EPPO EU categorization hosts distribution taxonomy
Agrilus planipennis Synonym: Agrilus feretrius, Agrilus marcopoli, Agrilus marcopoli ulmi; Other languages: agrile du frêne, emerald ash borer AGRLPL A2 list 2004 NA 2009 Annex I/A1 2009 NA 2014 Asia: Kazakhstan A1 list 2017/ RPPO/EU: EAEU A2 list 2016; NAPPO Alert list 2002 Major: Fraxinus americana, Fraxinus angustifolia subsp. oxycarpa, Fraxinus chinensis, Fraxinus excelsior, Fraxinus japonica, Fraxinus lanuginosa, Fraxinus nigra, Fraxinus ornus, Fraxinus pennsylvanica, Fraxinus quadrangulata, Fraxinus rhynchophylla, Fraxinus uhdei, Fraxinus velutina; Minor: Fraxinus, Fraxinus mandshurica; Incidental: Chionanthus virginicus, Juglans ailanthifolia, Juglans mandshurica, Pterocarya rhoifolia, Ulmus davidiana; Artificial: Olea europaea America: Canada; USA/ Asia: China; Japan; North Korea; Korea, Republic; Mongolia; Taiwan/ Europe: Russia Arthropoda
Xylella fastidiosa Synonym: Grapevine Pierce’s disease agent, Xylella fastidiosa subsp. fastidiosa, Xylella fastidiosa subsp. piercei; Other languages: Anaheim disease, California vine disease, dwarf disease of alfalfa, dwarf disease of lucerne, enanismo de la alfalfa, enfermedad de Pierce, leaf scorch of almond, leaf scorch of maple, maladie de Pierce, Pierce’s disease of grapevine XYLEFA A2 list 1981 NA 2017 c(“Emergency measures”, “Annex I/A2”) c(2015, 1992) c(NA, NA) c(NA, 2017) Africa: Southern Africa A1 list 2001/ America: Argentina A2 list 1995; Brazil A1 list 1992; Canada A1 list 1995; Chile A1 list 1992; Paraguay A1 list 1992; Uruguay A2 list 1992/ Asia: Bahrain A1 list 2003; Israel Quarantine pest 2009; Jordan Quarantine pest 2007/ Europe: Russia A1 list 2014; Turkey A1 list 2007; Ukraine A1 list 2010/ Oceania: New Zealand Quarantine pest 2000/ RPPO/EU: COSAVE A2 list 1992; IAPSC A1 list 1989; NAPPO Alert list 2001 Major: Citrus sinensis, Coffea, Nerium oleander, Olea europaea, Polygala myrtifolia, Prunus persica, Vitis vinifera, woody plants; Minor: Acacia dealbata, Acacia saligna, Acer pseudoplatanus, Acer rubrum, Anthyllis hermanniae, Artemisia arborescens, Asparagus acutifolius, Calicotome villosa, Carya illinoinensis, Catharanthus, Cistus creticus, Cistus monspeliensis, Cistus salviifolius, Citroncirus, Citrus, Coronilla valentina, Cytisus scoparius, Dodonaea viscosa, Eremophila maculata, Erigeron bonariensis, Erigeron sumatrensis, Erysimum, Euphorbia terracina, Fortunella, Genista corsica, Genista ephedroides, Genista x spachiana, Grevillea juniperina, Hebe, Helichrysum italicum, Heliotropium europaeum, Laurus nobilis, Lavandula angustifolia, Lavandula dentata, Lavandula stoechas, Lavandula x heterophylla, Lavandula x intermedia, Liquidambar styraciflua, Medicago sativa, Metrosideros excelsa, Morus alba, Morus rubra, Myoporum insulare, Myrtus communis, Pelargonium fragrans, Pelargonium graveolens, Persea americana, Phagnalon saxatile, Phillyrea latifolia, Platanus occidentalis, Poncirus trifoliata, Prunus angustifolia, Prunus armeniaca, Prunus avium, Prunus cerasifera, Prunus domestica, Prunus dulcis, Prunus salicina, Quercus palustris, Quercus rubra, Quercus suber, Rhamnus alaternus, Rosa Cluster-flowered bush hybrids, Rosa multiflora, Rosmarinus officinalis, Spartium junceum, Streptocarpus, Ulmus americana, Vaccinium corymbosum, Vaccinium virgatum, Vinca minor, Vitis, Vitis labrusca, Westringia fruticosa, Westringia glabra, x Citrofortunella microcarpa; Wild/Weed: Chenopodium album, Cyperaceae, Poaceae, Sorghum halepense America: Argentina; Brazil; Canada; Costa Rica; Mexico; Paraguay; Puerto Rico; USA; Venezuela/ Asia: Iran; Taiwan/ Europe: France; Italy; Spain Bacteria
Xylella fastidiosa subsp. fastidiosa NA XYLEFF Absent c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) NA Unclassified: Coffea, Erysimum, Nerium oleander, Polygala myrtifolia, Prunus avium, Prunus dulcis, Rosmarinus officinalis, Streptocarpus, Vitis vinifera America: Costa Rica; Mexico; USA/ Asia: Taiwan/ Europe: Spain Bacteria
Xylella fastidiosa subsp. multiplex Other languages: échaudure des feuilles du prunier, leaf scald of plum, leaf scorch of elm, leaf scorch of mulberry, leaf scorch of oak, leaf scorch of plane, peach phony agent, peach virus 4, phony disease of peach, prunus virus 3 XYLEFM Absent c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) NA Unclassified: Acacia dealbata, Acer pseudoplatanus, Anthyllis hermanniae, Artemisia arborescens, Asparagus acutifolius, Calicotome villosa, Carya illinoinensis, Cistus creticus, Cistus monspeliensis, Cistus salviifolius, Coronilla valentina, Cytisus scoparius, Genista corsica, Genista ephedroides, Genista x spachiana, Hebe, Helichrysum italicum, Lavandula angustifolia, Lavandula dentata, Lavandula stoechas, Lavandula x heterophylla, Metrosideros excelsa, Myrtus communis, Pelargonium graveolens, Phagnalon saxatile, Polygala myrtifolia, Prunus cerasifera, Prunus dulcis, Quercus suber, Rosa Cluster-flowered bush hybrids, Rosmarinus officinalis, Spartium junceum, Vaccinium America: Argentina; Brazil; Paraguay; USA/ Europe: France; Spain Bacteria
Xylella fastidiosa subsp. pauca Other languages: chlorose variégée des agrumes, citrus variegated chlorosis agent, variegated chlorosis of citrus XYLEFP Absent c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) NA Major: Olea europaea; Unclassified: Acacia saligna, Asparagus acutifolius, Catharanthus, Chenopodium album, Cistus creticus, Citrus, Coffea, Dodonaea viscosa, Eremophila maculata, Erigeron bonariensis, Erigeron sumatrensis, Euphorbia terracina, Grevillea juniperina, Heliotropium europaeum, Laurus nobilis, Lavandula angustifolia, Lavandula stoechas, Myoporum insulare, Myrtus communis, Nerium oleander, Pelargonium fragrans, Phillyrea latifolia, Polygala myrtifolia, Prunus avium, Prunus dulcis, Rhamnus alaternus, Rosmarinus officinalis, Spartium junceum, Vinca, Westringia fruticosa, Westringia glabra America: Argentina; Brazil; Costa Rica; Ecuador/ Europe: Italy; Spain Bacteria
Xylella fastidiosa subsp. sandyi Other languages: leaf scald of oleander XYLEFS Absent c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) NA Unclassified: Nerium oleander America: USA Bacteria
Pseudocercospora angolensis Synonym: Cercospora angolensis, Phaeoramularia angolensis, Pseudophaeoramularia angolensis; Other languages: Blattfleckenkrankheit: Zitrus, cercosporiose des agrumes, fruit spot of citrus, leaf spot of citrus CERCAN A1 list 2001 NA NA Annex II/A1 1992 NA NA Africa: East Africa A2 list 2001/ Asia: Israel Quarantine pest 2009/ RPPO/EU: APPPC A1 list 1993 Major: Citrus paradisi, Citrus reticulata, Citrus sinensis; Minor: Citrus, Citrus aurantiifolia, Citrus aurantium, Citrus jambhiri, Citrus x limonia Africa: Angola; Burundi; Cameroon; Central African Republic; Comoros; Congo; Congo, Democratic republic of the; Cote d’Ivoire; Ethiopia; Gabon; Gambia; Ghana; Guinea; Kenya; Mozambique; Nigeria; Rwanda; Sierra Leone; Tanzania; Togo; Uganda; Zambia; Zimbabwe/ Asia: Yemen Fungi
Meloidogyne fallax Other languages: bedrieglijk maiswortelknobbel nematode, false Columbia root-knot nematode MELGFA A2 list 1998 NA NA Annex I/A2 1998 NA NA Asia: Bahrain A1 list 2003; Kazakhstan A1 list 2017/ Europe: Norway Quarantine pest 2012; Russia A1 list 2014; Ukraine A1 list 2010/ RPPO/EU: EAEU A1 list 2016; NAPPO Alert list 2002 Major: Solanum tuberosum; Minor: Asparagus officinalis, Daucus carota subsp. sativus, Fragaria x ananassa, Scorzonera hispanica; Artificial: Solanum lycopersicum Europe: Belgium; France; Germany; Netherlands; Switzerland; UK/ Oceania: Australia; New Zealand Nematoda
Coconut cadang-cadang viroid Synonym: CCCVd, Coconut cadang-cadang cocadviroid, Palm cadang-cadang viroid; Other languages: cadang cadang, yellow mottling of palms CCCVD0 A1 list 1994 NA NA Annex II/A1 1992 NA NA America: Brazil A1 list 1995; United States of America Quarantine pest 1989/ Asia: Bahrain A1 list 2003; China A2 list 1988; Israel Quarantine pest 2009/ Europe: Turkey A1 list 2007/ RPPO/EU: APPPC A2 list 1988; CPPC A1 list 1990; PPPO A2 list 1993 Major: Cocos nucifera; Minor: Arecaceae, Corypha utan, Elaeis guineensis; Unclassified: Aiphanes horrida, Aiphanes minima, Allagoptera arenaria, Caryota mitis, Caryota urens, Corypha umbraculifera, Dictyosperma album, Dypsis decaryi, Dypsis leptocheilos, Gaussia attenuata, Howea belmoreana, Howea forsteriana, Latania lontaroides, Livistona australis, Nannorrhops ritchiana, Ravenea rivularis, Syagrus schizophylla, Trachycarpus fortunei; Incidental: Arenga pinnata, Borassus flabellifer; Artificial: Adonidia merrillii, Areca catechu, Dypsis lutescens, Phoenix dactylifera, Ptychosperma macarthuri, Roystonea regia Asia: Philippines Viruses and viroids
library('dplyr'); library('tidyr'); library('DBI'); library('RSQLite');
library('magrittr'); library('kableExtra'); library('knitr'); library('RCurl')

# define pest list and connect to SQL database
pestTest <- c('Xylella fastidiosa',
             'Emerald ash borer',
             'Meloidogyne fallax',
             'Cercospora angolensis',
             'cadang-cadang')
sqlEppo <- dbConnect(RSQLite::SQLite(),
                     dbname = 'eppocodes.sqlite')

# match names from list with sql DB
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, fullname, codelang, preferred 
                    FROM t_names WHERE fullname LIKE ',
                    paste(paste0("'%", pestTest, "%'"),
                    collapse = " OR fullname LIKE "))) -> pestInDB

# first use unique codeid variable values, than query the sql DB
# make table of correct names, synonyms and common names in other languages
codeIdDB <- unique(pestInDB$codeid)
 
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, fullname, preferred, codelang, status 
                    FROM t_names WHERE codeid IN (',
                    paste(codeIdDB, collapse = ', '), ')')) %>%
  filter(status == 'A') -> pestCorNames

#query sql for codeid and eppocode
sqlEppo %>% 
  dbGetQuery(paste0('SELECT codeid, eppocode FROM t_codes WHERE codeid IN (',
                    paste0(codeIdDB, collapse = ', '),')')) %>% 
  arrange(eppocode) -> EPPOcodes

#whole working table of names
pestCorNames %>% 
  filter(preferred == 1) %>% 
  left_join(pestCorNames[which(pestCorNames$preferred == 0),],
            by = 'codeid') %>%
  select('codeid'         = 1,
         'Preferred_name' = 2,
         'preferred.x'    = 3,
         -matches('codelang.x'),
         -matches('status.x'),
         'Other_names'    = 6,
         'preferred.y'    = 7,
         'language'       = 8,
         -matches('status.y')) %>%
  mutate(Name_type = ifelse(language == 'la', 'Synonym', 'Other languages')) %>%
  mutate(Other_names = replace(Other_names, is.na(Other_names), 'none'),
         language    = replace(language, is.na(language), 'none'),
         Name_type   = replace(Name_type, is.na(Name_type), 'Preferred')) %>%
####using which(is.na()) theoretically should cut time
  arrange(Preferred_name, desc(Name_type), Other_names) -> wholeTB

wholeTB %>% 
  select(codeid, Other_names, Name_type) %>% 
  filter(Name_type != 'Preferred') %>%
  nest(Name_type, Other_names) -> altNamesNest

# collapse row groups into one string per group
tbIndex <- seq(length(altNamesNest$data))
for (i in 1:length(altNamesNest$data)) {
  altNamesNest$data[[i]] %>% 
    group_by(Name_type) %>% 
    mutate(temp_names = paste(Other_names, collapse = ', ')) %>%
    distinct(temp_names) %>% 
    mutate(temp_names = paste(Name_type, temp_names, sep = ': ')) %>% 
    ungroup() %>%
    select(temp_names) %>%
    transmute(alt_names = paste(temp_names[1], temp_names[2], sep = '; ')) %>%
    distinct() -> tbIndex[i]
}

tbIndex <- gsub('; NA', '', tbIndex)

# eppo Url's constants
apiEppoUrl <- 'https://data.eppo.int/api/rest/1.0/taxon/'
apiType <- c('/categorization', '/hosts', '/names', '/taxonomy')
####apiToken <- <your token from EPPO site>
hostsLists <- paste0(EPPOcodes[,2],'List')
catLists <- paste0(EPPOcodes[,2],'Cat')
taxoLists <- paste0(EPPOcodes[,2],'Taxo')
hostUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[2], apiToken)
catUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[1], apiToken)
taxoUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[4], apiToken)
distriFiles <- paste0('distribution/',EPPOcodes[,2],'dFiles')
distriLists <- paste0(EPPOcodes[,2],'distri')
distriUrls <- paste0('https://gd.eppo.int/taxon/',
                     EPPOcodes[,2],'/download/distribution_csv')

# hosts download from EPPO api, assign to proper lists/df and create addres df
for (i in 1:length(hostsLists)) {
  assign(hostsLists[i], fromJSON(getURL(hostUrls[i])))
}

EPPOcodes %>% 
  mutate(listLength = sapply(hostsLists, function(x) length(get(x))),
         hostsLists = paste0(eppocode, 'List')) %>%
  filter(listLength != 0) -> addresTable

# collapse lists into df, than merge by row with new collumn = pest eppo code
for (i in 1:length(addresTable$hostsLists)) {
  assign(addresTable$hostsLists[i], 
         bind_rows(cbind(bind_rows(get(addresTable$hostsLists[i])),
                         pestEppocode = addresTable$eppocode[i], stringsAsFactors = F)))  
}

bind_rows(mget(addresTable$hostsLists)) %>% 
  select(labelclass, full_name, eppocode = pestEppocode) %>% 
  nest(labelclass, full_name) -> nestHosts

# collapse row per groups per eppocode
hostIndex <- seq(length(nestHosts$data))
for (i in 1:length(nestHosts$data)) {
  nestHosts$data[[i]] %>% 
    group_by(labelclass) %>% 
    mutate(temp_names = paste(full_name, collapse = ', ')) %>%
    distinct(temp_names) %>% 
    mutate(temp_names = paste(labelclass, temp_names, sep = ': ')) %>% 
    ungroup() %>%
    select(temp_names) %>%
    transmute(alt_names = paste(temp_names, collapse = '; ')) %>%
    distinct() %>% 
    unlist()-> hostIndex[i]
}

# download and assign pest categorization
for (i in 1:length(catLists)) {
  assign(catLists[i], fromJSON(getURL(catUrls[i])))
}

EPPOcodes %>% 
  mutate(listLength = sapply(catLists, 
                             function(x) length(get(x)), simplify = 'vector'),
         catLists = paste0(eppocode, 'Cat')) %>%
  filter(listLength != 0) %>% 
  mutate(EPPO = 'Absent', EU = 'Absent') -> RPPOTable

# make distinct columns with EPPO and EU categorization
for (i in 1:length(RPPOTable$EPPO)) {
  if (sum(get(RPPOTable$catLists[[i]])$country == 'EPPO')) {
    
    RPPOTable[i, 5] <- paste(get(RPPOTable$catLists[[i]])[which(get(
                                 RPPOTable$catLists[[i]])$country == 'EPPO'),
                                 c(5:8)],
                             collapse = ' ')
  }
}

for (i in 1:length(RPPOTable$EU)) {
  if (sum(get(RPPOTable$catLists[[i]])$country == 'EU')) {
    
    RPPOTable[i, 6] <- paste(get(RPPOTable$catLists[[i]])[which(get(
      RPPOTable$catLists[[i]])$country == 'EU'),
      c(5:8)],
      collapse = ' ')
  }
}

bind_rows(mget(catLists), .id = 'catLists') %>%
  filter(country != 'EU') %>%
  filter(country != 'EPPO') %>%
  select(-isocode, -qlist, -yr_del, -yr_trans) %>% 
  nest(nomcontinent, country, qlistlabel, yr_add) -> nestCat

# collapse row per groups per eppocode
catIndex <- seq(length(nestCat$data))
for (i in 1:length(nestCat$data)) {
nestCat$data[[i]] %>% 
  group_by(nomcontinent) %>% 
  mutate(temp_names = paste(country, qlistlabel, yr_add, collapse = '; ')) %>%
  distinct(temp_names) %>% 
  mutate(temp_names = paste(nomcontinent, temp_names, sep = ': ')) %>% 
  ungroup() %>%
  select(temp_names) %>%
  transmute(alt_names = paste(temp_names, collapse = '/ ')) %>%
  distinct()  %>% 
  unlist() -> catIndex[i]
}

# download and assign pest taxonom
for (i in 1:length(taxoLists)) {
  assign(taxoLists[i], fromJSON(getURL(taxoUrls[i])))
}

EPPOcodes %>% 
  mutate(listLength = sapply(taxoLists, 
                             function(x) length(get(x)), simplify = 'vector'),
         taxoLists,
         taxonomy = NA) %>%
  filter(listLength != 0) -> taxoTable

taxoMain <- as.data.frame(taxoTable)
for (i in 1:length(taxoTable$taxoLists)) {
  if (get(taxoTable$taxoLists[i])[1,3] != 'Animalia') {
  taxoMain[i,5] <- get(taxoTable$taxoLists[i])[1,3]
  } else {
   taxoMain[i,5] <- get(taxoTable$taxoLists[i])[2,3]
  }
}

#define empty list
distriList <- vector(mode='list', length(distriUrls))

EPPOcodes %>% mutate(distriLists = distriLists) -> distriTable
#download and distribution files to list
for (i in 1:length(distriList)) {
    distriList[i][[1]] <- read.csv(file = distriUrls[i],
                                   header = T, stringsAsFactors = F)
}
names(distriList) <- distriLists

# bind list elements and nest
bind_rows(distriList, .id = 'distriLists') %$%
  .[c(grep('Absent', Status, invert = T)),] %>%
  mutate(country = recode(country, 
                          'United States of America'     = 'USA'),
         country = recode(country, 
                          'United Kingdom'               = 'UK'),
         country = recode(country, 
                          "Korea Dem. People's Republic" = ' North Korea')) %>%
  select(distriLists, continent, country) %>%
  unique() %>%
  nest(continent, country) -> nestDistro

# collapse row per groups per eppocode
distroIndex <- seq(length(nestDistro$data))
for (i in 1:length(nestDistro$data)) {
  nestDistro$data[[i]] %>% 
    group_by(continent) %>% 
    mutate(temp_names = paste(country, collapse = '; ')) %>%
    distinct(temp_names) %>% 
    mutate(temp_names = paste(continent, temp_names, sep = ': ')) %>% 
    ungroup() %>%
    select(temp_names) %>%
    transmute(alt_names = paste(temp_names, collapse = '/ ')) %>%
    distinct()  %>% 
    unlist() -> distroIndex[i]
}

# make whole html table
wholeTB %>%
  distinct(codeid, Preferred_name) %>%
  left_join(altNamesNest %>%
              mutate(data = tbIndex) %>%
              rename(Alternative_names = data), by = 'codeid') %>%
  full_join(nestCat %>%
              mutate(data = unlist(catIndex)) %>%
              mutate(eppocode = gsub('Cat', '', catLists)) %>%
              full_join(RPPOTable) %>%
              select(codeid, eppocode, EPPO, EU, categorization = data),
            by = 'codeid') %>%
  left_join(full_join(addresTable, nestHosts %>%
                        mutate(hosts = unlist(hostIndex)), by = 'eppocode') %>%
              select(codeid, eppocode, hosts)) %>%
  left_join(right_join(distriTable, nestDistro %>%
                        mutate(distribution = unlist(distroIndex)), by = 'distriLists') %>% 
               select(codeid, eppocode, distribution)) %>%
  full_join(taxoMain %>% select(codeid, taxonomy), by = 'codeid') %>%
  select(-codeid) %>% arrange(taxonomy, Preferred_name) %>%
  kable('html') %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))