Speed up your bat sound analysis!
Jan 31, 2023
Looking to speed up your bat sound analysis workflow? No need to upgrade your computer!
It might make sense to assume that if you want your classifier (or Auto-ID as they're sometimes called) to go faster, you want a more powerful machine. That's one way of doing it and it does work. It is also often what is recommended to people. However, following some extensive testing I have done using multiple computers and drive combinations, there is a much (MUCH) cheaper way to improve the speed of your classifier and that is to use a faster drive. Let me explain!
A short introduction to data storage technologies
For consumer storage solutions, you basically have two options, fast solid state drives (SSD) (but expensive) or cheap hard disk drives (HDD) (but slow). HDD uses spinning disks to store data, while SSD uses NAND-based flash memory. The speed of a drive is typically measured in terms of read and write speeds, which indicate how quickly data can be transferred from the storage medium to a computer, and vice versa. Often, read speeds are higher and they are the ones manufacturers will most often advertise. Beyond sequential read and write speeds, there is a second metric that is often used to quantify the speed of a drive, IOPS. IOPS, or Input/Output Operations Per Second, is a measure of how many read and write operations a storage system or device can perform in a second. This is an important metric for determining the performance of a storage system, as a high number of IOPS indicates that the system can handle a large amount of read and write operations quickly, which can improve the overall performance of a computer or server.
Additionally, in addition to being faster than HDDs, SSDs are also more durable and less likely to be damaged by physical shocks because they do not contain any moving parts that can break or fail. This makes it a better choice for use in laptops and other portable devices. All this comes at a cost as SSD’s often cost 4-5 times as much per gigabyte and drives beyond 4TB are absurdly expensive. This makes them impractical for any kind of mass storage when you don't have a crazy budget to spend.
My testing
What does storage speed have to do with using classifiers to analyse bat call sequences? My most important finding is probably that by upgrading one of your external drives, to use as a scratch drive instead of choosing a more powerful computer to upgrade to, will make a much bigger difference in the processing speed. If you are already using fast SSD’s, this won’t apply to you but I also know that the majority of bat workers out there are still mostly using external hard drives.
You might think that a speed of 100 MB/s is fine because processing 20GB of data would take about 200 seconds, using some back of the envelope maths. Unfortunately, that’s not how it works. The way a classifier accesses the data is comparable to how databases are accessed i.e. it’s very random and the files will not be read in order. This is why it is reasonable to expect hard drives to struggle because spinning disks don’t generally perform well when it comes to random access to the data. And indeed, my testing shows that SSD’s are substantially better than HDD’s. I was still surprised by the margin with which SSD’s outperformed HDD’s, which led me to write this article to share my findings to avoid people overspending on their computer.
Let's dive into the results!
Results
Of the four classifiers tested, not all showed the same scaling of processing speed and drive speeds. The most striking scaling is found when working with Kaleidoscope Pro whereas the only classifier not benefiting at all from faster drives is Sonochiro.
Scaling of processing speed with drive speed
Let's compare these time savings with those achievable by upgrading from a laptop to a powerful workstation in one of the classifiers tested. Additional testing was done using Sonobat but is not shown in a table.
Kaleidoscope Pro
Processor |
LaCie rugged mini 2TB |
Seagate Barracuda 8TB |
G-Technology ArmorATD 5TB |
QNAP NAS 32TB |
SanDisk Extreme SSD 1TB |
SanDisk Extreme SSD V2 2TB |
Samsung 970 Evo 1TB |
Ryzen 9 3900x |
1381s |
839s |
673s |
85s |
45s |
29s |
23s |
Intel 1035 G7 |
1070s |
NA |
505s |
231s |
72s |
83s |
67s |
On a fast drive, the laptop was about on par with the workstation running a slow external drive in Sonobat. In Kaleidoscope, upgrading from a laptop to a workstation yielded at best a 290% improvement in performance. While it is impressive, that’s not accounting for the cost of the upgrade, which is far higher than 2.9x the cost of the mobile chip. In contrast, the ratio between the slowest and fastest drive for Kaleidoscope is 60x! In order words, chasing the fastest machine you can afford is most likely a worse way to spend your money than purchasing a quality 2-4TB SSD that you can store your data on while analysing it. These drives can usually be found for under 200 GBP/EUR/USD for the 2TB variant and will substantially decrease the time needed for your analysis.
Methods
For testing, I used a small dataset of ~20GB, containing 3008 recordings. Those recordings were made in Belgium and were not filtered prior to analysis. This means that not every file includes bat calls as bat recorders tend to be triggered by other things as well. This test dataset was processed using four classifiers, Sonochiro, Sonobat 30 UK, TadaridaL and Kaleidoscope Pro.
I tested four classifiers across a series of drives and two different computers, one thin and light laptop (Intel 1135 g7) and one powerful workstation (AMD Ryzen 9 3900x). I was unable to test every combination as this took a substantial amount of time already, time during which I wasn’t able to do anything else. Additionally, not every combination was possible, TadaridaL wouldn't run on my laptop because it didn't have enough RAM and there was no way to plug some of my drives such as the Samsung 990Pro into the laptop either because there are no exposed PCIe Gen 4 lanes. As a result, the comparisons between the drives aren’t perfectly ‘apples to apples’ but they are representative of the type of drives the average bat worker may have available for their analyses.
Drive name |
Rated Read speed (MB/s) |
Price per TB |
LaCie rugged mini 2TB |
130 |
48€ |
Seagate Barracuda 8TB |
190 |
21.25€ |
G-Technology ArmorATD 5TB |
140 |
35€ |
QNAP NAS 32TB |
255 per drive |
48€* |
SanDisk Extreme SSD (gen I) 1TB |
550 |
90€ |
SanDisk Extreme SSD (gen II) 2TB |
1050 |
94€ |
Samsung 970 Evo 1TB |
3400 |
118€ |
Samsung 990 Pro 2TB |
7400 |
165€ |
The fastest drive I’ve been able to use so far is the Samsung 990 Pro topping out at 1400K random read operations per second (IOPS). I would love to either build an array of 1000K+ drives to see how far I can push the IOPS of such an array or to get my hands on the upcoming PCIe Gen 5 from Samsung advertising 2500K IOPS on a single drive. It won’t be cheap but an NVMe array isn’t cheap either! Does it make sense for this application? Absolutely not but pushing the limit of the technology is always fun and it’s interesting to see how those technologies apply to our niche use case!
Notes
- The scaling is far from perfect and there are some surprising data points, such as the mobile processor beating the workstation chip when using a slow external hard drives but if anything, those results prove the point of this article even more.
- If you look at the system requirements for Kaleidoscope Pro (Wildlife Acoustics), they'll tell you the exact same thing. I think what my testing shows is the extent to which this helps as I suspect most people overlook that part of the system requirements.
- The total price of a NAS will vary massively depending on what enclosure you buy, the features you want, your drive configuration, the drive types, etc. My enclosure is a QNAP TS-h973AX with five WD Ultrastar DC HC320 8TB drives. The usable capacity is that of four drives, leaving one drive for redundancy. This means any one drive can fail before I start losing data.
- Tadarida does not run well (or at all) on machines with 16GB of RAM or less, which is the case of my laptop. Scaling with drive speeds is still present, however, even when testing a single machine.
Conclusion
What does this mean for bat workers, in practice?
If you use Sonobat or Kaleidoscope Pro (by far the two popular classifiers out there), investing in an SSD that you can use as scratch drive i.e. temporary storage while you run your analyses will yield much better improvements in your sound analysis workflow than regularly upgrading your computers. You're better off spending the money you would have spent on an expensive CPU upgrade on a decent backup solution*!
*Backup solutions will be the focus of a future blog post.
More posts on storage are coming!
We will be talking about keeping your data safe and how to deal with large datasets without spending too much. Join our mailing list so you don't miss them.
We hate SPAM. We will never sell your information, for any reason.