====== Wifi Scanner ======
**Goals:** Two goals for this project. I wanted to detect who was in the flat at any given time (see my [[Dynamic photo frame]] project) and I also wanted to see if I could detect spikes of activity near by (like a protest walking past the building etc).
==== Data ====
* [[Projects/Wifi/Scan 1]]: First preliminary results from a full week wifi scan of my local area (11th Sep 2019 to 19th Sep 2019)
* [[Projects/Wifi/Scan 2]]: Full two-month scan from 1st Nov 2019 to 6th Jan 2020. Looking for general daily activity
* [[Projects/Wifi/Scan 3]] - COVID-19: Are people self isolating. Wifi activity for March 2020
==== Setting up the hardware ====
I have a dedicated mini PC for this. It's sitting on the window ledge of my living room with a PCI-e 5G wifi card and multiple external antennas.
**Basic steps:** Identify your wifi device (in my case wlp3s0), the enter monitor mode, use tcpdump to capture mac addresses and a short php script to switch between available channels:
sudo ip link set wlp3s0 down
sudo iw wlp3s0 set monitor control
sudo ip link set wlp3s0 up
sudo tcpdump -i wlp3s0 -e -ttttnn > tcpdump.log
sudo ./channel_changer.php
You can find out which $channels your device supports using ''iw list'' (it's the numbers in the 'Frequencies' section). There are about 14 on older g/n devices and more on the 5G access points. I switch channel every 0.2 seconds (same as Kismet's default channel hop time) which seems to work fine.
==== channel_changer.php ====
#!/usr/bin/php
===== Importing the data =====
The raw tcpdump logs are pretty large and full of redundant information - for around a month of wifi scanning it records around 128 million lines of data (18.6Gb).
I run the following code to simplify the logs to just pairs of the datetime (in YYYY-MM-DD HH:MM - I strip off the seconds) and the mac address (see below for the php code):
php trim.php tcpdump.log > trimmed_tcpdump.log
On my laptop, this processes the log files at around 300k lines/second - so in around 8 minutes. The resulting import file is reduced to approximately 3.8 million lines.
I created a simple mysql table to store the timestamp and mac address:
create table wifi_data (seen_time datetime, mac varchar(17), unique (seen_time,mac));
Then I import this directly to the mysql database using the mysql client:
load data infile 'trimmed_tcpdump.log' into table wifi_data;
Once I imported all the data I added an index to the mac address column:
alter table simple_data add index idx_mac(mac);
If you have any trouble with this command, you might want to split the file into more managable parts using ''split -l 1000000 trimmed_tcpdump.log''
==== trim.php ====
#!/usr/bin/php
$val) {
echo $val['datetime']."\t".$val['mac']."\n";
}
?>
===== Analysing the data =====
I've made some graphs:
* [[https://starflyer.armchairscientist.co.uk/tmp/wifi3.php|General scan #2 - Sat July 6th 2019 8am-11:30am]] - All data grouped in unique MACs per 1 minute period
* [[https://starflyer.armchairscientist.co.uk/tmp/wifi3.php|General scan #3 - Sat July 6th 2019 8am-11:30am]] - As above, known (previously seen the hours/days before) devices/equipment filtered - an example of identifying a group of passerbys (Orange walk outside my window at 10:40am)
* [[https://starflyer.armchairscientist.co.uk/tmp/wifi4.php|General scan #4]] - Multiple days showing the weekend and spikes, at 8am and 5pm, of people passing by to and from work.
I'm still working on analysing an entire uninterrupted month to get some general statistics on wifi use around my area. Updates and code to follow.