A table in human readable format III
A table in human readable format Part III
Last two times I showed you how easily you can extract information from EPPO SQLite database, put it into easily understandable table and how to make use of EPPO REST API. Below I show you I will show you how to download files straight into list and use to expand table. Lets prepare our workflow by defining some variables e.g. EPPOcodes.
library('dplyr'); library('tidyr'); library('DBI'); library('RSQLite');
library('magrittr'); library('kableExtra'); library('knitr'); library('RCurl');
library('jsonlite')
# define pest list and connect to SQL database
pestTest <- c('Xylella fastidiosa',
'Emerald ash borer',
'Meloidogyne fallax',
'Cercospora angolensis',
'cadang-cadang')
sqlEppo <- dbConnect(RSQLite::SQLite(),
dbname = 'eppocodes.sqlite')
# match names from list with sql DB
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, fullname, codelang, preferred
FROM t_names WHERE fullname LIKE ',
paste(paste0("'%", pestTest, "%'"),
collapse = " OR fullname LIKE "))) -> pestInDB
codeIdDB <- unique(pestInDB$codeid)
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, fullname, preferred, codelang, status
FROM t_names WHERE codeid IN (',
paste(codeIdDB, collapse = ', '), ')')) %>%
filter(status == 'A') -> pestCorNames
#query sql for codeid and eppocode
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, eppocode FROM t_codes WHERE codeid IN (',
paste0(codeIdDB, collapse = ', '),')')) %>%
arrange(eppocode) -> EPPOcodes
#variables for distribution
distriFiles <- paste0('distribution/',EPPOcodes[,2],'dFiles')
distriLists <- paste0(EPPOcodes[,2],'distri')
distriUrls <- paste0('https://gd.eppo.int/taxon/',
EPPOcodes[,2],'/download/distribution_csv')
First we define empty list, and we create distriTable
which combines EPPO code, code id and names of our distribution variables. Then using simple loop, just use read.csv()
with links as an argument to download files straight into list. Finally we use names we created before for our distribution lists to name list elements (with the same names as in distriTable
).
#define empty list
distriList <- vector(mode='list', length(distriUrls))
EPPOcodes %>% mutate(distriLists = distriLists) -> distriTable
#download and distribution files to list
for (i in 1:length(distriList)) {
distriList[i][[1]] <- read.csv(file = distriUrls[i],
header = T, stringsAsFactors = F)
}
names(distriList) <- distriLists
Now, all the information we gathered from EPPO csv files on pest distribution are in one list. Using bind_rows()
from dplyr
package we merge them into one data frame. You notice that we use piping operator %$%
instead of %>%
. That is because %>%
is used when function on the right side does have built-in data argument, otherwise you can use %$%
. In few next steps of pipeline we make some cleaning – like getting rid of records that have absent Status
or changing long country names into something shorter. Eventaully we select only columns that will be used in next steps and nest table by country
and continent
.
bind_rows(distriList, .id = 'distriLists') %$%
.[c(grep('Absent', Status, invert = T)),] %>%
mutate(country = recode(country,
'United States of America' = 'USA'),
country = recode(country,
'United Kingdom' = 'UK'),
country = recode(country,
"Korea Dem. People's Republic" = ' North Korea')) %>%
select(distriLists, continent, country) %>%
unique() %>%
nest(continent, country) -> nestDistro
So, now are base (we have a data frame of distributions for each pest) is nearly ready. We need to make further adjustments and cleaning - to have everything in one data frame with all the distribution records for each species in single cell. We also want our records in some order i.e. name of the continent, countries, name of next continent, and so on. Below you find a for
loop, that goes through every data frame in our nested variable, group by continents
, merge country
names, once again merge with continent
names, and finally create one cell containing all the names for each pest. If you are lost in some point, try to execute below code chunk, piece by piece to find out what is going on.
Now lets join our distriTable
with nested table in which we mutate()
data
column with unlisted result from previous code chunk.
kable(right_join(distriTable, nestDistro %>%
mutate(data = unlist(distroIndex))),
'html') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
## Joining, by = "distriLists"
codeid | eppocode | distriLists | data |
---|---|---|---|
66001 | AGRLPL | AGRLPLdistri | America: Canada; USA/ Asia: China; Japan; North Korea; Korea, Republic; Mongolia; Taiwan/ Europe: Russia |
64718 | CCCVD0 | CCCVD0distri | Asia: Philippines |
410 | CERCAN | CERCANdistri | Africa: Angola; Burundi; Cameroon; Central African Republic; Comoros; Congo; Congo, Democratic republic of the; Cote d’Ivoire; Ethiopia; Gabon; Gambia; Ghana; Guinea; Kenya; Mozambique; Nigeria; Rwanda; Sierra Leone; Tanzania; Togo; Uganda; Zambia; Zimbabwe/ Asia: Yemen |
63576 | MELGFA | MELGFAdistri | Europe: Belgium; France; Germany; Netherlands; Switzerland; UK/ Oceania: Australia; New Zealand |
3243 | XYLEFA | XYLEFAdistri | America: Argentina; Brazil; Canada; Costa Rica; Mexico; Paraguay; Puerto Rico; USA; Venezuela/ Asia: Iran; Taiwan/ Europe: France; Italy; Spain |
92965 | XYLEFF | XYLEFFdistri | America: Costa Rica; Mexico; USA/ Asia: Taiwan/ Europe: Spain |
70536 | XYLEFM | XYLEFMdistri | America: Argentina; Brazil; Paraguay; USA/ Europe: France; Spain |
70537 | XYLEFP | XYLEFPdistri | America: Argentina; Brazil; Costa Rica; Ecuador/ Europe: Italy; Spain |
70538 | XYLEFS | XYLEFSdistri | America: USA |
OK. As we managed to make the last part of our Table in human readable format we are able to show final results. So lets put all the code altogether! THIS CODE NEEDS A LOT OF REFACTORING AND SIMPLIFYING
## Joining, by = c("catLists", "eppocode")
## Joining, by = c("codeid", "eppocode")
## Joining, by = c("codeid", "eppocode")
Preferred_name | Alternative_names | eppocode | EPPO | EU | categorization | hosts | distribution | taxonomy |
---|---|---|---|---|---|---|---|---|
Agrilus planipennis | Synonym: Agrilus feretrius, Agrilus marcopoli, Agrilus marcopoli ulmi; Other languages: agrile du frêne, emerald ash borer | AGRLPL | A2 list 2004 NA 2009 | Annex I/A1 2009 NA 2014 | Asia: Kazakhstan A1 list 2017/ RPPO/EU: EAEU A2 list 2016; NAPPO Alert list 2002 | Major: Fraxinus americana, Fraxinus angustifolia subsp. oxycarpa, Fraxinus chinensis, Fraxinus excelsior, Fraxinus japonica, Fraxinus lanuginosa, Fraxinus nigra, Fraxinus ornus, Fraxinus pennsylvanica, Fraxinus quadrangulata, Fraxinus rhynchophylla, Fraxinus uhdei, Fraxinus velutina; Minor: Fraxinus, Fraxinus mandshurica; Incidental: Chionanthus virginicus, Juglans ailanthifolia, Juglans mandshurica, Pterocarya rhoifolia, Ulmus davidiana; Artificial: Olea europaea | America: Canada; USA/ Asia: China; Japan; North Korea; Korea, Republic; Mongolia; Taiwan/ Europe: Russia | Arthropoda |
Xylella fastidiosa | Synonym: Grapevine Pierce’s disease agent, Xylella fastidiosa subsp. fastidiosa, Xylella fastidiosa subsp. piercei; Other languages: Anaheim disease, California vine disease, dwarf disease of alfalfa, dwarf disease of lucerne, enanismo de la alfalfa, enfermedad de Pierce, leaf scorch of almond, leaf scorch of maple, maladie de Pierce, Pierce’s disease of grapevine | XYLEFA | A2 list 1981 NA 2017 | c(“Emergency measures”, “Annex I/A2”) c(2015, 1992) c(NA, NA) c(NA, 2017) | Africa: Southern Africa A1 list 2001/ America: Argentina A2 list 1995; Brazil A1 list 1992; Canada A1 list 1995; Chile A1 list 1992; Paraguay A1 list 1992; Uruguay A2 list 1992/ Asia: Bahrain A1 list 2003; Israel Quarantine pest 2009; Jordan Quarantine pest 2007/ Europe: Russia A1 list 2014; Turkey A1 list 2007; Ukraine A1 list 2010/ Oceania: New Zealand Quarantine pest 2000/ RPPO/EU: COSAVE A2 list 1992; IAPSC A1 list 1989; NAPPO Alert list 2001 | Major: Citrus sinensis, Coffea, Nerium oleander, Olea europaea, Polygala myrtifolia, Prunus persica, Vitis vinifera, woody plants; Minor: Acacia dealbata, Acacia saligna, Acer pseudoplatanus, Acer rubrum, Anthyllis hermanniae, Artemisia arborescens, Asparagus acutifolius, Calicotome villosa, Carya illinoinensis, Catharanthus, Cistus creticus, Cistus monspeliensis, Cistus salviifolius, Citroncirus, Citrus, Coronilla valentina, Cytisus scoparius, Dodonaea viscosa, Eremophila maculata, Erigeron bonariensis, Erigeron sumatrensis, Erysimum, Euphorbia terracina, Fortunella, Genista corsica, Genista ephedroides, Genista x spachiana, Grevillea juniperina, Hebe, Helichrysum italicum, Heliotropium europaeum, Laurus nobilis, Lavandula angustifolia, Lavandula dentata, Lavandula stoechas, Lavandula x heterophylla, Lavandula x intermedia, Liquidambar styraciflua, Medicago sativa, Metrosideros excelsa, Morus alba, Morus rubra, Myoporum insulare, Myrtus communis, Pelargonium fragrans, Pelargonium graveolens, Persea americana, Phagnalon saxatile, Phillyrea latifolia, Platanus occidentalis, Poncirus trifoliata, Prunus angustifolia, Prunus armeniaca, Prunus avium, Prunus cerasifera, Prunus domestica, Prunus dulcis, Prunus salicina, Quercus palustris, Quercus rubra, Quercus suber, Rhamnus alaternus, Rosa Cluster-flowered bush hybrids, Rosa multiflora, Rosmarinus officinalis, Spartium junceum, Streptocarpus, Ulmus americana, Vaccinium corymbosum, Vaccinium virgatum, Vinca minor, Vitis, Vitis labrusca, Westringia fruticosa, Westringia glabra, x Citrofortunella microcarpa; Wild/Weed: Chenopodium album, Cyperaceae, Poaceae, Sorghum halepense | America: Argentina; Brazil; Canada; Costa Rica; Mexico; Paraguay; Puerto Rico; USA; Venezuela/ Asia: Iran; Taiwan/ Europe: France; Italy; Spain | Bacteria |
Xylella fastidiosa subsp. fastidiosa | NA | XYLEFF | Absent | c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) | NA | Unclassified: Coffea, Erysimum, Nerium oleander, Polygala myrtifolia, Prunus avium, Prunus dulcis, Rosmarinus officinalis, Streptocarpus, Vitis vinifera | America: Costa Rica; Mexico; USA/ Asia: Taiwan/ Europe: Spain | Bacteria |
Xylella fastidiosa subsp. multiplex | Other languages: échaudure des feuilles du prunier, leaf scald of plum, leaf scorch of elm, leaf scorch of mulberry, leaf scorch of oak, leaf scorch of plane, peach phony agent, peach virus 4, phony disease of peach, prunus virus 3 | XYLEFM | Absent | c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) | NA | Unclassified: Acacia dealbata, Acer pseudoplatanus, Anthyllis hermanniae, Artemisia arborescens, Asparagus acutifolius, Calicotome villosa, Carya illinoinensis, Cistus creticus, Cistus monspeliensis, Cistus salviifolius, Coronilla valentina, Cytisus scoparius, Genista corsica, Genista ephedroides, Genista x spachiana, Hebe, Helichrysum italicum, Lavandula angustifolia, Lavandula dentata, Lavandula stoechas, Lavandula x heterophylla, Metrosideros excelsa, Myrtus communis, Pelargonium graveolens, Phagnalon saxatile, Polygala myrtifolia, Prunus cerasifera, Prunus dulcis, Quercus suber, Rosa Cluster-flowered bush hybrids, Rosmarinus officinalis, Spartium junceum, Vaccinium | America: Argentina; Brazil; Paraguay; USA/ Europe: France; Spain | Bacteria |
Xylella fastidiosa subsp. pauca | Other languages: chlorose variégée des agrumes, citrus variegated chlorosis agent, variegated chlorosis of citrus | XYLEFP | Absent | c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) | NA | Major: Olea europaea; Unclassified: Acacia saligna, Asparagus acutifolius, Catharanthus, Chenopodium album, Cistus creticus, Citrus, Coffea, Dodonaea viscosa, Eremophila maculata, Erigeron bonariensis, Erigeron sumatrensis, Euphorbia terracina, Grevillea juniperina, Heliotropium europaeum, Laurus nobilis, Lavandula angustifolia, Lavandula stoechas, Myoporum insulare, Myrtus communis, Nerium oleander, Pelargonium fragrans, Phillyrea latifolia, Polygala myrtifolia, Prunus avium, Prunus dulcis, Rhamnus alaternus, Rosmarinus officinalis, Spartium junceum, Vinca, Westringia fruticosa, Westringia glabra | America: Argentina; Brazil; Costa Rica; Ecuador/ Europe: Italy; Spain | Bacteria |
Xylella fastidiosa subsp. sandyi | Other languages: leaf scald of oleander | XYLEFS | Absent | c(“Annex I/A2”, “Emergency measures”) c(1992, 2015) c(NA, NA) c(2017, NA) | NA | Unclassified: Nerium oleander | America: USA | Bacteria |
Pseudocercospora angolensis | Synonym: Cercospora angolensis, Phaeoramularia angolensis, Pseudophaeoramularia angolensis; Other languages: Blattfleckenkrankheit: Zitrus, cercosporiose des agrumes, fruit spot of citrus, leaf spot of citrus | CERCAN | A1 list 2001 NA NA | Annex II/A1 1992 NA NA | Africa: East Africa A2 list 2001/ Asia: Israel Quarantine pest 2009/ RPPO/EU: APPPC A1 list 1993 | Major: Citrus paradisi, Citrus reticulata, Citrus sinensis; Minor: Citrus, Citrus aurantiifolia, Citrus aurantium, Citrus jambhiri, Citrus x limonia | Africa: Angola; Burundi; Cameroon; Central African Republic; Comoros; Congo; Congo, Democratic republic of the; Cote d’Ivoire; Ethiopia; Gabon; Gambia; Ghana; Guinea; Kenya; Mozambique; Nigeria; Rwanda; Sierra Leone; Tanzania; Togo; Uganda; Zambia; Zimbabwe/ Asia: Yemen | Fungi |
Meloidogyne fallax | Other languages: bedrieglijk maiswortelknobbel nematode, false Columbia root-knot nematode | MELGFA | A2 list 1998 NA NA | Annex I/A2 1998 NA NA | Asia: Bahrain A1 list 2003; Kazakhstan A1 list 2017/ Europe: Norway Quarantine pest 2012; Russia A1 list 2014; Ukraine A1 list 2010/ RPPO/EU: EAEU A1 list 2016; NAPPO Alert list 2002 | Major: Solanum tuberosum; Minor: Asparagus officinalis, Daucus carota subsp. sativus, Fragaria x ananassa, Scorzonera hispanica; Artificial: Solanum lycopersicum | Europe: Belgium; France; Germany; Netherlands; Switzerland; UK/ Oceania: Australia; New Zealand | Nematoda |
Coconut cadang-cadang viroid | Synonym: CCCVd, Coconut cadang-cadang cocadviroid, Palm cadang-cadang viroid; Other languages: cadang cadang, yellow mottling of palms | CCCVD0 | A1 list 1994 NA NA | Annex II/A1 1992 NA NA | America: Brazil A1 list 1995; United States of America Quarantine pest 1989/ Asia: Bahrain A1 list 2003; China A2 list 1988; Israel Quarantine pest 2009/ Europe: Turkey A1 list 2007/ RPPO/EU: APPPC A2 list 1988; CPPC A1 list 1990; PPPO A2 list 1993 | Major: Cocos nucifera; Minor: Arecaceae, Corypha utan, Elaeis guineensis; Unclassified: Aiphanes horrida, Aiphanes minima, Allagoptera arenaria, Caryota mitis, Caryota urens, Corypha umbraculifera, Dictyosperma album, Dypsis decaryi, Dypsis leptocheilos, Gaussia attenuata, Howea belmoreana, Howea forsteriana, Latania lontaroides, Livistona australis, Nannorrhops ritchiana, Ravenea rivularis, Syagrus schizophylla, Trachycarpus fortunei; Incidental: Arenga pinnata, Borassus flabellifer; Artificial: Adonidia merrillii, Areca catechu, Dypsis lutescens, Phoenix dactylifera, Ptychosperma macarthuri, Roystonea regia | Asia: Philippines | Viruses and viroids |
library('dplyr'); library('tidyr'); library('DBI'); library('RSQLite');
library('magrittr'); library('kableExtra'); library('knitr'); library('RCurl')
# define pest list and connect to SQL database
pestTest <- c('Xylella fastidiosa',
'Emerald ash borer',
'Meloidogyne fallax',
'Cercospora angolensis',
'cadang-cadang')
sqlEppo <- dbConnect(RSQLite::SQLite(),
dbname = 'eppocodes.sqlite')
# match names from list with sql DB
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, fullname, codelang, preferred
FROM t_names WHERE fullname LIKE ',
paste(paste0("'%", pestTest, "%'"),
collapse = " OR fullname LIKE "))) -> pestInDB
# first use unique codeid variable values, than query the sql DB
# make table of correct names, synonyms and common names in other languages
codeIdDB <- unique(pestInDB$codeid)
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, fullname, preferred, codelang, status
FROM t_names WHERE codeid IN (',
paste(codeIdDB, collapse = ', '), ')')) %>%
filter(status == 'A') -> pestCorNames
#query sql for codeid and eppocode
sqlEppo %>%
dbGetQuery(paste0('SELECT codeid, eppocode FROM t_codes WHERE codeid IN (',
paste0(codeIdDB, collapse = ', '),')')) %>%
arrange(eppocode) -> EPPOcodes
#whole working table of names
pestCorNames %>%
filter(preferred == 1) %>%
left_join(pestCorNames[which(pestCorNames$preferred == 0),],
by = 'codeid') %>%
select('codeid' = 1,
'Preferred_name' = 2,
'preferred.x' = 3,
-matches('codelang.x'),
-matches('status.x'),
'Other_names' = 6,
'preferred.y' = 7,
'language' = 8,
-matches('status.y')) %>%
mutate(Name_type = ifelse(language == 'la', 'Synonym', 'Other languages')) %>%
mutate(Other_names = replace(Other_names, is.na(Other_names), 'none'),
language = replace(language, is.na(language), 'none'),
Name_type = replace(Name_type, is.na(Name_type), 'Preferred')) %>%
####using which(is.na()) theoretically should cut time
arrange(Preferred_name, desc(Name_type), Other_names) -> wholeTB
wholeTB %>%
select(codeid, Other_names, Name_type) %>%
filter(Name_type != 'Preferred') %>%
nest(Name_type, Other_names) -> altNamesNest
# collapse row groups into one string per group
tbIndex <- seq(length(altNamesNest$data))
for (i in 1:length(altNamesNest$data)) {
altNamesNest$data[[i]] %>%
group_by(Name_type) %>%
mutate(temp_names = paste(Other_names, collapse = ', ')) %>%
distinct(temp_names) %>%
mutate(temp_names = paste(Name_type, temp_names, sep = ': ')) %>%
ungroup() %>%
select(temp_names) %>%
transmute(alt_names = paste(temp_names[1], temp_names[2], sep = '; ')) %>%
distinct() -> tbIndex[i]
}
tbIndex <- gsub('; NA', '', tbIndex)
# eppo Url's constants
apiEppoUrl <- 'https://data.eppo.int/api/rest/1.0/taxon/'
apiType <- c('/categorization', '/hosts', '/names', '/taxonomy')
####apiToken <- <your token from EPPO site>
hostsLists <- paste0(EPPOcodes[,2],'List')
catLists <- paste0(EPPOcodes[,2],'Cat')
taxoLists <- paste0(EPPOcodes[,2],'Taxo')
hostUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[2], apiToken)
catUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[1], apiToken)
taxoUrls <-paste0(apiEppoUrl, EPPOcodes[,2], apiType[4], apiToken)
distriFiles <- paste0('distribution/',EPPOcodes[,2],'dFiles')
distriLists <- paste0(EPPOcodes[,2],'distri')
distriUrls <- paste0('https://gd.eppo.int/taxon/',
EPPOcodes[,2],'/download/distribution_csv')
# hosts download from EPPO api, assign to proper lists/df and create addres df
for (i in 1:length(hostsLists)) {
assign(hostsLists[i], fromJSON(getURL(hostUrls[i])))
}
EPPOcodes %>%
mutate(listLength = sapply(hostsLists, function(x) length(get(x))),
hostsLists = paste0(eppocode, 'List')) %>%
filter(listLength != 0) -> addresTable
# collapse lists into df, than merge by row with new collumn = pest eppo code
for (i in 1:length(addresTable$hostsLists)) {
assign(addresTable$hostsLists[i],
bind_rows(cbind(bind_rows(get(addresTable$hostsLists[i])),
pestEppocode = addresTable$eppocode[i], stringsAsFactors = F)))
}
bind_rows(mget(addresTable$hostsLists)) %>%
select(labelclass, full_name, eppocode = pestEppocode) %>%
nest(labelclass, full_name) -> nestHosts
# collapse row per groups per eppocode
hostIndex <- seq(length(nestHosts$data))
for (i in 1:length(nestHosts$data)) {
nestHosts$data[[i]] %>%
group_by(labelclass) %>%
mutate(temp_names = paste(full_name, collapse = ', ')) %>%
distinct(temp_names) %>%
mutate(temp_names = paste(labelclass, temp_names, sep = ': ')) %>%
ungroup() %>%
select(temp_names) %>%
transmute(alt_names = paste(temp_names, collapse = '; ')) %>%
distinct() %>%
unlist()-> hostIndex[i]
}
# download and assign pest categorization
for (i in 1:length(catLists)) {
assign(catLists[i], fromJSON(getURL(catUrls[i])))
}
EPPOcodes %>%
mutate(listLength = sapply(catLists,
function(x) length(get(x)), simplify = 'vector'),
catLists = paste0(eppocode, 'Cat')) %>%
filter(listLength != 0) %>%
mutate(EPPO = 'Absent', EU = 'Absent') -> RPPOTable
# make distinct columns with EPPO and EU categorization
for (i in 1:length(RPPOTable$EPPO)) {
if (sum(get(RPPOTable$catLists[[i]])$country == 'EPPO')) {
RPPOTable[i, 5] <- paste(get(RPPOTable$catLists[[i]])[which(get(
RPPOTable$catLists[[i]])$country == 'EPPO'),
c(5:8)],
collapse = ' ')
}
}
for (i in 1:length(RPPOTable$EU)) {
if (sum(get(RPPOTable$catLists[[i]])$country == 'EU')) {
RPPOTable[i, 6] <- paste(get(RPPOTable$catLists[[i]])[which(get(
RPPOTable$catLists[[i]])$country == 'EU'),
c(5:8)],
collapse = ' ')
}
}
bind_rows(mget(catLists), .id = 'catLists') %>%
filter(country != 'EU') %>%
filter(country != 'EPPO') %>%
select(-isocode, -qlist, -yr_del, -yr_trans) %>%
nest(nomcontinent, country, qlistlabel, yr_add) -> nestCat
# collapse row per groups per eppocode
catIndex <- seq(length(nestCat$data))
for (i in 1:length(nestCat$data)) {
nestCat$data[[i]] %>%
group_by(nomcontinent) %>%
mutate(temp_names = paste(country, qlistlabel, yr_add, collapse = '; ')) %>%
distinct(temp_names) %>%
mutate(temp_names = paste(nomcontinent, temp_names, sep = ': ')) %>%
ungroup() %>%
select(temp_names) %>%
transmute(alt_names = paste(temp_names, collapse = '/ ')) %>%
distinct() %>%
unlist() -> catIndex[i]
}
# download and assign pest taxonom
for (i in 1:length(taxoLists)) {
assign(taxoLists[i], fromJSON(getURL(taxoUrls[i])))
}
EPPOcodes %>%
mutate(listLength = sapply(taxoLists,
function(x) length(get(x)), simplify = 'vector'),
taxoLists,
taxonomy = NA) %>%
filter(listLength != 0) -> taxoTable
taxoMain <- as.data.frame(taxoTable)
for (i in 1:length(taxoTable$taxoLists)) {
if (get(taxoTable$taxoLists[i])[1,3] != 'Animalia') {
taxoMain[i,5] <- get(taxoTable$taxoLists[i])[1,3]
} else {
taxoMain[i,5] <- get(taxoTable$taxoLists[i])[2,3]
}
}
#define empty list
distriList <- vector(mode='list', length(distriUrls))
EPPOcodes %>% mutate(distriLists = distriLists) -> distriTable
#download and distribution files to list
for (i in 1:length(distriList)) {
distriList[i][[1]] <- read.csv(file = distriUrls[i],
header = T, stringsAsFactors = F)
}
names(distriList) <- distriLists
# bind list elements and nest
bind_rows(distriList, .id = 'distriLists') %$%
.[c(grep('Absent', Status, invert = T)),] %>%
mutate(country = recode(country,
'United States of America' = 'USA'),
country = recode(country,
'United Kingdom' = 'UK'),
country = recode(country,
"Korea Dem. People's Republic" = ' North Korea')) %>%
select(distriLists, continent, country) %>%
unique() %>%
nest(continent, country) -> nestDistro
# collapse row per groups per eppocode
distroIndex <- seq(length(nestDistro$data))
for (i in 1:length(nestDistro$data)) {
nestDistro$data[[i]] %>%
group_by(continent) %>%
mutate(temp_names = paste(country, collapse = '; ')) %>%
distinct(temp_names) %>%
mutate(temp_names = paste(continent, temp_names, sep = ': ')) %>%
ungroup() %>%
select(temp_names) %>%
transmute(alt_names = paste(temp_names, collapse = '/ ')) %>%
distinct() %>%
unlist() -> distroIndex[i]
}
# make whole html table
wholeTB %>%
distinct(codeid, Preferred_name) %>%
left_join(altNamesNest %>%
mutate(data = tbIndex) %>%
rename(Alternative_names = data), by = 'codeid') %>%
full_join(nestCat %>%
mutate(data = unlist(catIndex)) %>%
mutate(eppocode = gsub('Cat', '', catLists)) %>%
full_join(RPPOTable) %>%
select(codeid, eppocode, EPPO, EU, categorization = data),
by = 'codeid') %>%
left_join(full_join(addresTable, nestHosts %>%
mutate(hosts = unlist(hostIndex)), by = 'eppocode') %>%
select(codeid, eppocode, hosts)) %>%
left_join(right_join(distriTable, nestDistro %>%
mutate(distribution = unlist(distroIndex)), by = 'distriLists') %>%
select(codeid, eppocode, distribution)) %>%
full_join(taxoMain %>% select(codeid, taxonomy), by = 'codeid') %>%
select(-codeid) %>% arrange(taxonomy, Preferred_name) %>%
kable('html') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))