Monday, November 19, 2018

Removing duplicate files and folders - Cleaning up my hard disks

Ever since I started using a computer about 20 years ago, I carefully collect all the files thinking I would need in the future and quite frankly, I absolutely needed none of the file I created before 10 years. Those days I was hoping that in the Job interviewers might ask me to show the works I have done, but they never did. Having had the chance to see the HR processes from inside the organization in few of my previous jobs, I could tell, one should be lucky if an employer had the mindset to actually check whether an applicant has done something in the past and what they have done. All these new fancy HR processes make people too busy having to complete too many interviews for a position open, for reporting sack, they do not have the time to actually go and check an applicant's past projects and the quality of the work.

My habit to collect and save all the files that I make for the projects has gotten to a point I need multiple high-capacity hard disks and cloud space. I save my project files, important tutorials, camera raw files, picture/video edits, templates, songs I like, movies I like and many other files including automated back-up every three months. My digital hoarding disorder has gotten out of proportion to a level, I have a back-up of back-ups on Google clouds and important documents on one drive.

External HardDiskSo, now I have about 1 TB cloud-space, 8 TB local external storage options on multiple devices and a bunch of SD cards and pen drives could be up to a total of 500 gigs. Whoever would read this blog post five years or ten years from now would smirk at these numbers as we now laugh at the 20 gig HD on our old P4 computer (Millennium Super computers: 😛) now, but for the present-day specs, I think as an individual PC user trying to keep this much of data is crazy. At the place I worked for in 2005, the whole organization had about 2 TB space to manage the data.

Last few months, most of my weekends got wasted in my efforts to organize these files and eliminate duplicate files and folders. Within a few hours of starting this task I figured that it's not humanly possible to go through these thousands of folders and files one by one and delete the duplicate files. I was not ready to trust the exe HD clean-up utilities on the internet either. So, with few googlings, I wrote up a bat code that could loop through the file list and identify the duplicate files and print the details on the cmd screen. Again, it was difficult to go one by and delete, also dangerous to let the program automatically delete all at once. 

After few more attempts to modify the bat file code to selectively delete the files and folders by prompting the user, I found the code to be messy, complex and not very user-friendly for the user to go one by one again. By having the same bat logic as the core, I wrote a Java program with few classes achieving much better usability. Struggled a lot to fix the GUI freeze when thread is busy problem. When the java.io is busy in the current thread reading the files or in separate thread, the GUI becomes non-responsive and we cannot see the progress of the process. After drilling Stack Overflow with all what I got, I found out that the SwingWorker is the answer to solve the problem. From there, it was easy peasy, within a few hours the "JJCleanFF" :) utility is ready and was able to delete a lot of duplicate folders and files.

For now, I share here the jar file and instructions to use the utility with you, after cleaning up the code a little bit and adding documentation, will post on GitHub and share the link in this post for source code.  So,

1. Download the executable jar file by clicking below icon or visiting https://www.jeyaramj.com/downloads/JJCleanFF.jar
Download JJ Clean FF
2. Open the downloaded jar file.
JJ Clean FF Main Window
3. In the main window, click on "Select a folder" to select the parent folder in which the folders and files be checked for duplicates
JJ Clean FF Browse Window
4. Once the folder is selected, there will be a message prompt for you to choose the file comparison method. I coded to compare the files in two ways. One is to generate a digital hash(SHA-512) of the files and check against other folders and files. If the file size is big this method would slow down your computer and better go with second method. First method is good for text files and files that are too important. If you want to choose the first method, click on "Yes". Second method is basically checking the file name and size only. If two different images with same file name and file size, but in two different folders, these two files will be identified as duplicate file with the second method. But second method is much faster for the utility to scan the folders and identify duplicates. So, if you have less important big movie files, choose second method by selecting "No".
JJ Clean FF UI
5. Now the utility will scan for duplicate files and folders and show you the percentage completed.
  
JJ Clean FF UI

6. Once the scanning process is completed, if there are duplicate files will be listed in a separate window for you to click on the check box and delete.
JJ Clean FF UI

7. Once you click on a check box, you will be prompted to confirm and once you choose "Yes", the file or folder will be permanently deleted from your HD.
JJ Clean FF UI

I tested few times in my machine by creating dummy folders and files and couldn't find any issues yet. I deleted many files on my local external disk using this utility. Please check and let me know, if there are any issues.

* - I do not accept any responsibility and will not be liable for any loss or damage suffered by you whatsoever arising out of the usage of this utility.