Since around 2005 my mum has taken an average of 3,200 pictures a year. That's after I removed all duplicate files. I'm still deciding on what software to suggest she uses to manage her archive of around 43k photos. In the meantime I wanted to generate some stats with the goal of making a simple interface for her to browse the backup I have of her archive (by date and maybe location).
I've already removed duplicates, but lets save an md5 of the image to help stop duplicates in future imports and also keep a record of any duplicates we deleted (there might be valuable information in the directory path of those deleted duplicates that'll help us search/organise in the future):
// I've trimmed some database connection stuff and some code that fills the $dupe_arr variable $data = file_get_contents("list-of-images.txt"); $lines = explode("\n", $data); foreach($lines as $line) { $line = trim($line); $filename = basename($line); $sql = "insert into images set relative_path=?, original_filename=?, md5sum=?, created_on=now(), extra_info=?"; $md5sum = md5_file($line); $extra_info = ""; if(!empty($dupe_arr[$line])) $extra_info = implode("\n", $dupe_arr[$line]); $stmt = $db->prepare($sql); $stmt->execute([$line, $filename, $md5sum, $extra_info]); echo "."; }
Now we have a list of 50k images in a database that has this simple schema:
mysql> describe images; +-------------------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------------+---------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | relative_path | varchar(1000) | YES | | NULL | | | original_filename | varchar(1000) | YES | | NULL | | | md5sum | varchar(100) | YES | | NULL | | | created_on | datetime | YES | | NULL | | | extra_info | varchar(2000) | YES | | NULL | | +-------------------+---------------+------+-----+---------+----------------+
Now some more code that'll extract all the exif data it can find from these images. This just runs the exif command (apt install exif
) with the -m
parameter which, by default, outputs all tags in a machine readable (tabbed) format. Then saves it to a database table:
<?php require("db_connect.php"); $sql = "select * from images"; $stmt = $db->query($sql); while($row = $stmt->fetch()) { $image_id = $row['id']; $filename = $row['relative_path']; $data = `exif -m "$filename"`; $lines = explode("\n", $data); foreach($lines as $line) { if($dline = explode("\t", $line)) { if(!empty($dline[0])) { $sql = "insert into exif_meta set image_id=?, field_name=?, field_value=?"; $stmt_ins = $db->prepare($sql); $name = $dline[0]; $value = $dline[1]; $stmt_ins->execute([$image_id, $name, $value]); } } } } ?>
This populates a very simple table with this schema (I was just curious to look at the data - I'll extract what I need from here, or maybe add some indexes):
mysql> describe exif_meta; +-------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+---------------+------+-----+---------+-------+ | image_id | int(11) | YES | | NULL | | | field_name | varchar(1000) | YES | | NULL | | | field_value | varchar(2000) | YES | | NULL | | +-------------+---------------+------+-----+---------+-------+
Now, lets see what we got. 1.8 million rows for those ~50k images. Let's get actual.
Number of images | 43,545 | select count(id) from images |
Images with EXIF data | 33,945 | select count(distinct image_id) from exif_meta |
So, 9,600 images didn't have readable EXIF data. These are either corrupt files or images that have been resized and stripped of EXIF data or downloaded from somewhere that stripped it - or the camera just didn't support any type of EXIF at all, but that seems unlikely for the date period.
How many tags per image? About 46
Average | 46.5 | select avg(count) from (select count(field_name) as count from exif_meta group by image_id) as counts |
Mean | 48 | select count,count(*) from (select count(field_name) as count from exif_meta group by image_id) as counts group by count order by count(*) |
Min | 8 | as above |
Max | 59 |
What can we extract at a glance? When were the photos taken?
SELECT YEAR(field_value) AS YEAR,COUNT(image_id) AS number_of_pics FROM exif_meta WHERE field_name='date and time' GROUP BY YEAR(field_value) ORDER BY YEAR(field_value);
year | number_of_pics |
---|---|
NULL | 2 |
2004 | 2 |
2005 | 2404 |
2006 | 1551 |
2007 | 495 |
2008 | 431 |
2009 | 565 |
2010 | 1044 |
2011 | 962 |
2012 | 1862 |
2013 | 2559 |
2014 | 3278 |
2015 | 5273 |
2016 | 6763 |
2017 | 1463 |
2018 | 4265 |
2019 | 1184 |
2026 | 311 |
That looks about right. At one point the camera must have had it's time and date set to the future.
What type of camera did she use?
SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='Manufacturer' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) | field_value |
---|---|
2 | |
53 | NIKON |
62 | OLYMPUS IMAGING CORP. |
74 | Canon |
120 | Eastman Kodak Company |
311 | Samsung Techwin |
1315 | CASIO COMPUTER CO.,LTD |
3848 | SAMSUNG |
3970 | SONY |
8355 | CASIO COMPUTER CO.,LTD. |
16189 | Apple |
That also looks about right, most of the photos were taken on a phone.
Let's take an overview of what some EXIF tags contain:
SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='X-Resolution' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) | field_value |
---|---|
2 | 150 |
2 | 75 |
2 | 96 |
62 | 314 |
106 | 300 |
311 | 1 |
311 | 96.0 |
1985 | 350 |
64416 | 72 |
SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='Orientation' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) | field_value |
---|---|
178 | Left-bottom |
1139 | Bottom-right |
3841 | Right-top |
34828 | Top-left |