python - Using the `.find_next_siblings` function in Beautiful Soup -
i attempting write output of web scraping csv file, here code:
import bs4 import requests import csv #get webpage apple inc. september income statement page = requests.get("https://au.finance.yahoo.com/q/is?s=aapl") #put beautiful soup soup = bs4.beautifulsoup(page.content) #select table holds data of interest table = soup.find("table", class_="yfnc_tabledata1") #creates headers table headers = table.find('tr', class_="yfnc_modtitle1") #creates generator holds 4 values yearly revenues company total_revenue = headers.next_sibling cost_of_revenue = total_revenue.next_sibling gross_profit = cost_of_revenue.next_sibling.next_sibling wang = headers.find_next_siblings("tr") #iterates through generator above , writes output csv file open('/home/kwal0203/desktop/apple.csv', 'a') csvfile: writer = csv.writer(csvfile,delimiter="|") writer.writerow([value.get_text(strip=true).encode("utf-8") value in headers]) writer.writerow([value.get_text(strip=true).encode("utf-8") value in total_revenue]) writer.writerow([value.get_text(strip=true).encode("utf-8") value in cost_of_revenue]) writer.writerow([value.get_text(strip=true).encode("utf-8") value in gross_profit]) dude in wang: writer.writerow([dude.get_text(strip=true).encode("utf-8")])
the problem repeating lot of code when creating , writing each row csv. can see keep repeating next_sibling
next row of values. found .find_next_siblings()
function in beautiful soup , want each row functions reads gets outputted 1 cell of csv file.
any ideas? let me know if question not clear.
thanks.
okay, not perfect solution, suppose, idea check next siblings amounts , skip rows without:
next_rows = [[td.get_text(strip=true).encode("utf-8") td in row('td')] row in headers.find_next_siblings("tr")] pattern = re.compile(r'^[\d,]+$') data = [[item item in l if pattern.match(item)] l in next_rows] data = [l l in data if l] open('/home/kwal0203/desktop/apple.csv', 'a') csvfile: writer = csv.writer(csvfile, delimiter="|") writer.writerows(data)
produces:
42,123,000|37,432,000|45,646,000|57,594,000 26,114,000|22,697,000|27,699,000|35,748,000 16,009,000|14,735,000|17,947,000|21,846,000 1,686,000|1,603,000|1,422,000|1,330,000 3,158,000|2,850,000|2,932,000|3,053,000 11,165,000|10,282,000|13,593,000|17,463,000 307,000|202,000|225,000|246,000 11,472,000|10,484,000|13,818,000|17,709,000 11,472,000|10,484,000|13,818,000|17,709,000 3,005,000|2,736,000|3,595,000|4,637,000 8,467,000|7,748,000|10,223,000|13,072,000 8,467,000|7,748,000|10,223,000|13,072,000 8,467,000|7,748,000|10,223,000|13,072,000
these amounts table.
Comments
Post a Comment