html - Scrape a page with JavaScript from R -


i new web scraping in r , have ran problem sites reference javascript. attempting scrape data web page below , have been unsuccessful. believe javascript links prevent me accessing table. result r package "xml" function "readhtmltable" comes null.

library(xml) library(rcurl) url <- "http://votingrights.news21.com/interactive/movement-voter-id/index.html" tabs <- geturl(url) tabs <- htmlparse(url) tabs <- readhtmltable(tabs, stringsasfactors = false) 

how can access javascript links data? or possible? when using direct link data (below) , r package "rjson" still unable read in data.

library("rjson") json_file <- "http://votingrights.news21.com/static/interactives/movement/data/fulldata.js" lines <- readlines(json_file) json_data <- fromjson(lines, collapse="") 

the file reference javascript file containing json rather json. in case can manually scrub contents data:

library("rjson") json_file <- "http://votingrights.news21.com/static/interactives/movement/data/fulldata.js" lines <- readlines(json_file) lines[1] <- sub(".* = (.*)", "\\1", lines[1]) lines[length(lines)] <- sub(";", "", lines[length(lines)]) json_data <- fromjson(paste(lines, collapse="\n")) > head(json_data[[1]][[1]]) $state [1] "alabama"  $bill [1] "hb 19"  $category [1] "strict photo id"  $introduced [1] "mar 1, 2011"  $house [1] "yes"  $senate [1] "yes" 

if want interact javascript data on webpage can use selenium:

library(rselenium) appurl <- "http://votingrights.news21.com/static/interactives/movement/index.html" pjs <- phantom() remdr <- remotedriver(browsername = "phantom") remdr$open() remdr$navigate(appurl) fulldata <- remdr$executescript("return fulldata;") pjs$stop() > head(fulldata[[1]][[1]]) $state [1] "alabama"  $bill [1] "hb 19"  $category [1] "strict photo id"  $introduced [1] "mar 1, 2011"  $house [1] "yes"  $senate [1] "yes" 

Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -