Monday, November 19, 2018

Removing duplicate files and folders - Cleaning up my hard disks

Ever since I started using a computer about 20 years ago, I've carefully collected all the files I thought I might need in the future. Quite frankly, I absolutely needed none of the files I created before 10 years ago. Back then, I hoped that job interviewers might ask me to show the work I had done, but they never did. Having had the chance to see the HR processes from inside organizations in a few of my previous jobs, I can tell you that one should be lucky if an employer actually checks whether an applicant has done
 something in the past and what they have done. All these new fancy HR processes make people too busy having to complete too many interviews for open positions. For reporting's sake, they don't have the time to actually go and check an applicant's past projects and the quality of their work.
My habit of collecting and saving all the files I make for projects has gotten to a point where I need multiple high-capacity hard disks and cloud space. I save my project files, important tutorials, camera raw files, picture/video edits, templates, songs I like, movies I like, and many other files, including automated backups every three months. My digital hoarding disorder has gotten out of proportion to a level where I have a backup of backups on Google clouds and important documents on OneDrive.
External HardDisk

So now, I have about 1 TB of cloud space, 8 TB of local external storage options on multiple devices, and a bunch of SD cards and pen drives that could total up to 500 gigs. Whoever reads this blog post five or ten years from now would smirk at these numbers, just as we now laugh at the 20 gig HD on our old P4 computer (Millennium Super computers: 😛), but for present-day specs, I think as an individual PC user trying to keep this much data is crazy. At the place I worked for in 2005, the whole organization had about 2 TB of space to manage their data.
In recent months, most of my weekends have been wasted in efforts to organize these files and eliminate duplicate files and folders. Within a few hours of starting this task, I figured out that it's not humanly possible to go through these thousands of folders and files one by one and delete the duplicate files. I wasn't ready to trust the exe HD clean-up utilities on the internet either. So, with a bit of Googling, I wrote up a bat code that could loop through the file list and identify the duplicate files and print the details on the cmd screen. Again, it was difficult to go through one by one and delete, and also dangerous to let the program automatically delete all at once.
After a few more attempts to modify the bat file code to selectively delete the files and folders by prompting the user, I found the code to be messy, complex, and not very user-friendly for the user to go through one by one again. By having the same bat logic as the core, I wrote a Java program with a few classes, achieving much better usability. I struggled a lot to fix the GUI freeze when the thread is busy problem. When java.io is busy in the current thread reading the files or in a separate thread, the GUI becomes non-responsive, and we cannot see the progress of the process. After drilling Stack Overflow with all I had, I found out that SwingWorker  is the answer to solve the problem. From there, it was easy peasy. Within a few hours, the "JJCleanFF" :) utility was ready, and I was able to delete a lot of duplicate folders and files.
For now, I'll share the jar file and instructions to use the utility with you. After cleaning up the code a little bit and adding documentation, I'll post it on GitHub and share the link in this post for the source code. So,

1. Download the executable jar file by clicking the below icon or visiting https://www.jeyaramj.com/downloads/JJCleanFF.jar
Download JJ Clean FF
2. Open the downloaded jar file.
JJ Clean FF Main Window
3. In the main window, click on "Select a folder" to select the parent folder in which the folders and files will be checked for duplicates
JJ Clean FF Browse Window
4. Once the folder is selected, there will be a message prompt for you to choose the file comparison method. I coded to compare the files in two ways. One is to generate a digital hash (SHA-512) of the files and check against other folders and files. If the file size is big, this method would slow down your computer, and it's better to go with the second method. The first method is good for text files and files that are too important. If you want to choose the first method, click on "Yes". The second method is basically checking the file name and size only. If two different images have the same file name and file size but are in two different folders, these two files will be identified as duplicate files with the second method. But the second method is much faster for the utility to scan the folders and identify duplicates. So, if you have less important big movie files, choose the second method by selecting "No".

5. Now the utility will scan for duplicate files and folders and show you the percentage completed.
JJ Clean FF UI
6. Once the scanning process is completed, if there are duplicate files, they will be listed in a separate window for you to click on the checkbox and delete.
JJ Clean FF UI
7. Once you click on a checkbox, you will be prompted to confirm, and once you choose "Yes", the file or folder will be permanently deleted from your HD.
JJ Clean FF UI

I've tested this utility a few times on my machine by creating dummy folders and files and couldn't find any issues yet. I deleted many files on my local external disk using this utility. Please check it out and let me know if there are any issues.

I do not accept any responsibility and will not be liable for any loss or damage suffered by you whatsoever arising out of the usage of this utility.