This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
tesco_scraper [2022/05/02 15:26] admin |
tesco_scraper [2022/05/02 15:38] (current) admin |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Tesco Scraper ====== | ====== Tesco Scraper ====== | ||
I threw this together one afternoon as I was frustrated that even though Tesco displays the 'price-per-ml' for things like cans and bottles of Coca Cola, it doesn't let you sort by 'price-per-ml'. So I wrote a quick script that will download an entire sub category of products, extract the data from them, then output them all to a CSV that I can filter/sort myself. | I threw this together one afternoon as I was frustrated that even though Tesco displays the 'price-per-ml' for things like cans and bottles of Coca Cola, it doesn't let you sort by 'price-per-ml'. So I wrote a quick script that will download an entire sub category of products, extract the data from them, then output them all to a CSV that I can filter/sort myself. | ||
+ | |||
+ | It was easier than I expected (I didn't have to do much scraping) as there is a data-element in the body tag of the page that has all the product detail in json format. | ||
<code php tesco-to-csv.php> | <code php tesco-to-csv.php> | ||
Line 46: | Line 48: | ||
} | } | ||
- | $lines = array('Product Name', 'Tesco URL', 'Brand', 'Price', 'Price per unit', 'Unit Measure'); | + | $lines = array(); |
+ | $lines[] = array('Product Name', 'Tesco URL', 'Brand', 'Price', 'Price per unit', 'Unit Measure'); | ||
// read through each of the pages, parse the json and output as csv | // read through each of the pages, parse the json and output as csv | ||
echo "Parsing JSON to CSV\n"; | echo "Parsing JSON to CSV\n"; | ||
Line 81: | Line 84: | ||
fputcsv($fp, $line); | fputcsv($fp, $line); | ||
} | } | ||
+ | |||
</code> | </code> | ||
Line 94: | Line 98: | ||
Writing CSV | Writing CSV | ||
</code> | </code> | ||
+ | |||
+ | {{:pasted:20220502-153759.png}} |