User Tools

Site Tools


large_image_archives

Large Image Archives

Since around 2005 my mum has taken an average of 3,200 pictures a year. That's after I removed all duplicate files. I'm still deciding on what software to suggest she uses to manage her archive of around 43k photos. In the meantime I wanted to generate some stats with the goal of making a simple interface for her to browse the backup I have of her archive (by date and maybe location).

I've already removed duplicates, but lets save an md5 of the image to help stop duplicates in future imports and also keep a record of any duplicates we deleted (there might be valuable information in the directory path of those deleted duplicates that'll help us search/organise in the future):

// I've trimmed some database connection stuff and some code that fills the $dupe_arr variable
$data = file_get_contents("list-of-images.txt");
$lines = explode("\n", $data);
 
foreach($lines as $line) {
        $line = trim($line);
                $filename = basename($line);
                $sql = "insert into images set relative_path=?, original_filename=?, md5sum=?, created_on=now(), extra_info=?";
                $md5sum = md5_file($line);
                $extra_info = "";
                if(!empty($dupe_arr[$line]))
                        $extra_info = implode("\n", $dupe_arr[$line]);
                $stmt = $db->prepare($sql);
                $stmt->execute([$line, $filename, $md5sum, $extra_info]);
                echo ".";
}

Now we have a list of 50k images in a database that has this simple schema:

mysql> describe images;
+-------------------+---------------+------+-----+---------+----------------+
| Field             | Type          | Null | Key | Default | Extra          |
+-------------------+---------------+------+-----+---------+----------------+
| id                | int(11)       | NO   | PRI | NULL    | auto_increment |
| relative_path     | varchar(1000) | YES  |     | NULL    |                |
| original_filename | varchar(1000) | YES  |     | NULL    |                |
| md5sum            | varchar(100)  | YES  |     | NULL    |                |
| created_on        | datetime      | YES  |     | NULL    |                |
| extra_info        | varchar(2000) | YES  |     | NULL    |                |
+-------------------+---------------+------+-----+---------+----------------+

Now some more code that'll extract all the exif data it can find from these images. This just runs the exif command (apt install exif) with the -m parameter which, by default, outputs all tags in a machine readable (tabbed) format. Then saves it to a database table:

<?php
require("db_connect.php");
$sql = "select * from images";
$stmt = $db->query($sql);
while($row = $stmt->fetch()) {
        $image_id = $row['id'];
        $filename = $row['relative_path'];
        $data = `exif -m "$filename"`;
        $lines = explode("\n", $data);
        foreach($lines as $line) {
                if($dline = explode("\t", $line)) {
                if(!empty($dline[0])) {
                        $sql = "insert into exif_meta set image_id=?, field_name=?, field_value=?";
                        $stmt_ins = $db->prepare($sql);
                        $name = $dline[0];
                        $value = $dline[1];
                        $stmt_ins->execute([$image_id, $name, $value]);
                }
                }
        }
}
?>

This populates a very simple table with this schema (I was just curious to look at the data - I'll extract what I need from here, or maybe add some indexes):

mysql> describe exif_meta;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| image_id    | int(11)       | YES  |     | NULL    |       |
| field_name  | varchar(1000) | YES  |     | NULL    |       |
| field_value | varchar(2000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+

Stats

Now, lets see what we got. 1.8 million rows for those ~50k images. Let's get actual.

Number of images 43,545 select count(id) from images
Images with EXIF data 33,945 select count(distinct image_id) from exif_meta

So, 9,600 images didn't have readable EXIF data. These are either corrupt files or images that have been resized and stripped of EXIF data or downloaded from somewhere that stripped it - or the camera just didn't support any type of EXIF at all, but that seems unlikely for the date period.

How many tags per image? About 46

Average 46.5 select avg(count) from (select count(field_name) as count from exif_meta group by image_id) as counts
Mean 48 select count,count(*) from (select count(field_name) as count from exif_meta group by image_id) as counts group by count order by count(*)
Min 8 as above
Max 59

Data

What can we extract at a glance? When were the photos taken?

Date

SELECT YEAR(field_value) AS YEAR,COUNT(image_id) AS number_of_pics FROM exif_meta WHERE field_name='date and time' GROUP BY YEAR(field_value) ORDER BY YEAR(field_value);
year number_of_pics
NULL 2
2004 2
2005 2404
2006 1551
2007 495
2008 431
2009 565
2010 1044
2011 962
2012 1862
2013 2559
2014 3278
2015 5273
2016 6763
2017 1463
2018 4265
2019 1184
2026 311

That looks about right. At one point the camera must have had it's time and date set to the future.

Manufacturer

What type of camera did she use?

SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='Manufacturer' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) field_value
2
53 NIKON
62 OLYMPUS IMAGING CORP.
74 Canon
120 Eastman Kodak Company
311 Samsung Techwin
1315 CASIO COMPUTER CO.,LTD
3848 SAMSUNG
3970 SONY
8355 CASIO COMPUTER CO.,LTD.
16189 Apple

That also looks about right, most of the photos were taken on a phone.

Other EXIF

Let's take an overview of what some EXIF tags contain:

EXIF: X-Resolution

SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='X-Resolution' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) field_value
2 150
2 75
2 96
62 314
106 300
311 1
311 96.0
1985 350
64416 72

EXIF: Orientation

SELECT COUNT(image_id), field_value FROM exif_meta WHERE field_name='Orientation' GROUP BY field_value ORDER BY COUNT(image_id);
count(image_id) field_value
178 Left-bottom
1139 Bottom-right
3841 Right-top
34828 Top-left
large_image_archives.txt · Last modified: 2019/12/06 17:11 by neil