This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
tesco_scraper [2022/05/02 14:26] admin |
tesco_scraper [2022/05/02 14:38] (current) admin |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Tesco Scraper ====== | ====== Tesco Scraper ====== | ||
| I threw this together one afternoon as I was frustrated that even though Tesco displays the 'price-per-ml' for things like cans and bottles of Coca Cola, it doesn't let you sort by 'price-per-ml'. So I wrote a quick script that will download an entire sub category of products, extract the data from them, then output them all to a CSV that I can filter/sort myself. | I threw this together one afternoon as I was frustrated that even though Tesco displays the 'price-per-ml' for things like cans and bottles of Coca Cola, it doesn't let you sort by 'price-per-ml'. So I wrote a quick script that will download an entire sub category of products, extract the data from them, then output them all to a CSV that I can filter/sort myself. | ||
| + | |||
| + | It was easier than I expected (I didn't have to do much scraping) as there is a data-element in the body tag of the page that has all the product detail in json format. | ||
| <code php tesco-to-csv.php> | <code php tesco-to-csv.php> | ||
| Line 46: | Line 48: | ||
| } | } | ||
| - | $lines = array('Product Name', 'Tesco URL', 'Brand', 'Price', 'Price per unit', 'Unit Measure'); | + | $lines = array(); |
| + | $lines[] = array('Product Name', 'Tesco URL', 'Brand', 'Price', 'Price per unit', 'Unit Measure'); | ||
| // read through each of the pages, parse the json and output as csv | // read through each of the pages, parse the json and output as csv | ||
| echo "Parsing JSON to CSV\n"; | echo "Parsing JSON to CSV\n"; | ||
| Line 81: | Line 84: | ||
| fputcsv($fp, $line); | fputcsv($fp, $line); | ||
| } | } | ||
| + | |||
| </code> | </code> | ||
| Line 94: | Line 98: | ||
| Writing CSV | Writing CSV | ||
| </code> | </code> | ||
| + | |||
| + | {{:pasted:20220502-153759.png}} | ||