Using Seek & Destroy Music Duplicates (www.pekarna.si)
Q: How to start searching for duplicates?
First you have to define roots. Roots are directories where the search & analyses of music is performed. Use the Roots button. Then use the Scan button.Q: When should I lock the root? (by using the 'Lock' checkbox in table of roots of Manage Roots dialog)
When you have collections of music that you don't want to be recognized as music that can be (re)moved. For example: you have folder with input directory named 'c:\soulseek-input' and two external hard disks with the master collections of music with paths: "e:\music' and 'f::\music'. Then you can define three roots: (c:\soulseek-input, unlocked), ("e:\music', locked) and ('f::\music', locked).When you'll start the search process the program will compare music inside the chosen roots, but will give a chance to remove only the duplicates from the unlocked root c:\soulseek-input.
Q: How the duplicates are presented?
Inside the main table. Duplicates are visually distinct by using the two alternative row colors (stripe effect :).
Q: What type of comparators are supported?
Data comparator for the files of type: "mp3", "ogg", "aac", "wma"
.
Tag comparator for files of type: "mp3" and "ogg". (Tag is information about the song inside the song
file.)
Sound comparator is in development.
(You can check what type of comparator was used for the particular duplicate in the column 'Match'). Letter
'D' is for data comparator, letter 'T' is for tag comparator, (letter
'S' is for sound comparator)).
Q: Which type of comparator is better?
Data comparator compares the particular segments of music files (CRC32 checksum), Tag comparator computes the tag information similarity. Data comparator is faster in the first scan and it selects the exact duplicates of the songs, but it has lower score. The Tag comparator is slower in first scan and less accurate, but it has a higher score. Combination of both delivers good and relatively accurate results.
Q: How can I overrule the program decisions of what is duplicate of what?
By checking/un-checking the rows in the table of duplicates (column 'X'). Later, when you decide to do that, only checked rows (files) will be removed.
Q: How does the program help me in the duplicate validation phase?
1. You can listen to the song, by double-clicking the row.
2. You can open the folder of the song by selecting the row and then pressing key 'E'.
3. If the max. time difference of the length of the songs in the group of duplicates is bigger than 5
seconds, then the color of column 'Time' is
in bold!
Q: What happen when I press Remove button?
The program will ask you to define the destination folder where the duplicates (with checked rows) will be removed (If it's not yet defined.). Then will give you a chance to trigger the duplicate-removal process.
Q: Will the program delete the duplicates?
No, program will just move the checked duplicates from the (unlocked) roots to the destination folder. It's up to you to delete them manually (if and when you decide).
Q: What are the common use-case scenarios of the program?
First usage:
1. Define roots. Pay attention to lock the master collections. (Master collections are collections that are organized
and without :) the dupes.)
2. Perform first scan. (slower)
3. Apply the 'Strict Tags' strategy ('Strategy' button).
--- if any dupes found ---
4. Validate the dupes.
5. Define the destination directory where the dupes are moved to.
6. Remove dupes (actually move them to the destination directory).
7. Apply the 'Less Strict Tags' strategy.
--- if any dupes found ---
8. Validate the dupes.
9. Remove dupes.
10. Apply the 'Loose tags' strategy.
--- if any dupes found ---
11. Validate the dupes.
12. Remove dupes.
13. Save the session! (File->Save Session)
Next usages:
1. Open the saved session. (File->Open Session.. or File->Recent Sessions)
2. Perform scan. (fast)
--- if any dupes found ---
3. Validate the dupes using the 'Strict Tags', 'Less Strict Tags' and 'Loose Tags' strategies ('Strategy' button).
4. Remove dupes (if any).
5. Save the session!
Q: What is the session?
Session is the memorized state of the program information. It includes information of the scanned music and destination (remove-to) directory and information about manually checked/unchecked rows.
Q: Why should I use the sessions?
Behind the session is hidden an algorithm that allows the incremental update of the search-for-duplicates results. In other words: the consequent searches are performed very fast.
Q: What is the minimum hardware configuration to run the app?
This app is not greedy, but some operations also depend on the number of the duplicates. App will eat max. 96MB of RAM (but you can also change the JVM -Xmx setting if u want). My friend tested the app on P3 processor, and as he said: it works fine.
Q: Which actions are slow, if the music collection is large?
On my collection of about 38000 songs the following operations are slow:
1: applying the Loose Tags match strategy (35 seconds)
2: saving the session (20 seconds)
3: opening the session (40 seconds)
4: after removing the dupes: recalculation of changes (25 seconds)
5: exporting to XML (1'10 minutes)
I'm developing this app on Acer Aspire 1712 (3GHz) (great laptop).
Q: Can I use this software?
Of course you can. I've put it under the general GNU General Public License (GPL).
Q: Where are the sources?
Sources will be available at the sourceforge by 15.march 2006.
Q: Why should I donate?
You can guess why.. :)
Q: Other resources?
Enjoy your life!
Igor Tavčar. igor_tavcar@t-2.net
(V Ljubljani, dne 4. marca 2006)