Page 1 of 2

Duplicate File Finder (File Comparrison, File Compare)

Posted: Sat Jan 31, 2015 6:48 pm
by parkerc
As the whole family backs up their devices to the QNAP and i also sync various photos and documents from different sources, a really great feature would be the ability to find duplicate files, and be given the the option to delete them or not, to help free up space etc.

I've used an app on the PC previously called 'Win Merge' which works very well, by showing you the duplicates and rather than just basing the decision on just the file name being the same, it delves a little deeper so you know they are really identical.

If QNAP could include something like this it would add great value, and give people the chance to optimise storage etc.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Wed Feb 11, 2015 5:11 pm
by ZenoNap
I agree on that. Recently found myself in backing up all data and checking twice or three times everything to be sure nothing was left behind

not tested but viewtopic.php?f=24&t=96769#p428619 seems a start

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu Feb 12, 2015 2:57 am
by pwilson
ZenoNap wrote:I agree on that. Recently found myself in backing up all data and checking twice or three times everything to be sure nothing was left behind

not tested but viewtopic.php?f=24&t=96769#p428619 seems a start


Why didn't you simply quote it?

In message: Re: Folder compare, pwilson wrote:RSync can already "pull" from another Rsync server.

Code: Select all

rsync --help
rsync  version 3.0.7  protocol version 30
Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

rsync is a file transfer program capable of efficient remote update
via a fast differencing algorithm.

Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
  or   rsync [OPTION]... [USER@]HOST:SRC [DEST]
  or   rsync [OPTION]... [USER@]HOST::SRC [DEST]
  or   rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.

Options
 -v, --verbose               increase verbosity
 -q, --quiet                 suppress non-error messages
     --no-motd               suppress daemon-mode MOTD (see manpage caveat)
 -c, --checksum              skip based on checksum, not mod-time & size
 -a, --archive               archive mode; equals -rlptgoD (no -H,-A,-X)
     --no-OPTION             turn off an implied OPTION (e.g. --no-D)
 -r, --recursive             recurse into directories
 -R, --relative              use relative path names
     --no-implied-dirs       don't send implied dirs with --relative
 -b, --backup                make backups (see --suffix & --backup-dir)
     --backup-dir=DIR        make backups into hierarchy based in DIR
     --suffix=SUFFIX         set backup suffix (default ~ w/o --backup-dir)
 -u, --update                skip files that are newer on the receiver
     --inplace               update destination files in-place (SEE MAN PAGE)
     --append                append data onto shorter files
     --append-verify         like --append, but with old data in file checksum
 -d, --dirs                  transfer directories without recursing
 -l, --links                 copy symlinks as symlinks
 -L, --copy-links            transform symlink into referent file/dir
     --copy-unsafe-links     only "unsafe" symlinks are transformed
     --safe-links            ignore symlinks that point outside the source tree
 -k, --copy-dirlinks         transform symlink to a dir into referent dir
 -K, --keep-dirlinks         treat symlinked dir on receiver as dir
 -H, --hard-links            preserve hard links
 -p, --perms                 preserve permissions
 -E, --executability         preserve the file's executability
     --chmod=CHMOD           affect file and/or directory permissions
 -A, --acls                  preserve ACLs (implies --perms)
 -X, --xattrs                preserve extended attributes
 -o, --owner                 preserve owner (super-user only)
 -g, --group                 preserve group
     --devices               preserve device files (super-user only)
     --specials              preserve special files
 -D                          same as --devices --specials
 -t, --times                 preserve modification times
 -O, --omit-dir-times        omit directories from --times
     --super                 receiver attempts super-user activities
     --fake-super            store/recover privileged attrs using xattrs
 -S, --sparse                handle sparse files efficiently
 -n, --dry-run               perform a trial run with no changes made
 -W, --whole-file            copy files whole (without delta-xfer algorithm)
 -x, --one-file-system       don't cross filesystem boundaries
 -B, --block-size=SIZE       force a fixed checksum block-size
 -e, --rsh=COMMAND           specify the remote shell to use
     --rsync-path=PROGRAM    specify the rsync to run on the remote machine
     --existing              skip creating new files on receiver
     --ignore-existing       skip updating files that already exist on receiver
     --remove-source-files   sender removes synchronized files (non-dirs)
     --del                   an alias for --delete-during
     --delete                delete extraneous files from destination dirs
     --delete-before         receiver deletes before transfer, not during
     --delete-during         receiver deletes during transfer (default)
     --delete-delay          find deletions during, delete after
     --delete-after          receiver deletes after transfer, not during
     --delete-excluded       also delete excluded files from destination dirs
     --ignore-errors         delete even if there are I/O errors
     --force                 force deletion of directories even if not empty
     --max-delete=NUM        don't delete more than NUM files
     --max-size=SIZE         don't transfer any file larger than SIZE
     --min-size=SIZE         don't transfer any file smaller than SIZE
     --partial               keep partially transferred files
     --partial-dir=DIR       put a partially transferred file into DIR
     --delay-updates         put all updated files into place at transfer's end
 -m, --prune-empty-dirs      prune empty directory chains from the file-list
     --numeric-ids           don't map uid/gid values by user/group name
     --timeout=SECONDS       set I/O timeout in seconds
     --contimeout=SECONDS    set daemon connection timeout in seconds
 -I, --ignore-times          don't skip files that match in size and mod-time
     --size-only             skip files that match in size
     --modify-window=NUM     compare mod-times with reduced accuracy
 -T, --temp-dir=DIR          create temporary files in directory DIR
 -y, --fuzzy                 find similar file for basis if no dest file
     --compare-dest=DIR      also compare destination files relative to DIR
     --copy-dest=DIR         ... and include copies of unchanged files
     --link-dest=DIR         hardlink to files in DIR when unchanged
 -z, --compress              compress file data during the transfer
     --compress-level=NUM    explicitly set compression level
     --skip-compress=LIST    skip compressing files with a suffix in LIST
 -C, --cvs-exclude           auto-ignore files the same way CVS does
 -f, --filter=RULE           add a file-filtering RULE
 -F                          same as --filter='dir-merge /.rsync-filter'
                             repeated: --filter='- .rsync-filter'
     --exclude=PATTERN       exclude files matching PATTERN
     --exclude-from=FILE     read exclude patterns from FILE
     --include=PATTERN       don't exclude files matching PATTERN
     --include-from=FILE     read include patterns from FILE
     --files-from=FILE       read list of source-file names from FILE
 -0, --from0                 all *-from/filter files are delimited by 0s
 -s, --protect-args          no space-splitting; only wildcard special-chars
     --address=ADDRESS       bind address for outgoing socket to daemon
     --port=PORT             specify double-colon alternate port number
     --sockopts=OPTIONS      specify custom TCP options
     --blocking-io           use blocking I/O for the remote shell
     --stats                 give some file-transfer stats
 -8, --8-bit-output          leave high-bit chars unescaped in output
 -h, --human-readable        output numbers in a human-readable format
     --progress              show progress during transfer
 -P                          same as --partial --progress
 -i, --itemize-changes       output a change-summary for all updates
     --out-format=FORMAT     output updates using the specified FORMAT
     --log-file=FILE         log what we're doing to the specified FILE
     --log-file-format=FMT   log updates using the specified FMT
     --password-file=FILE    read daemon-access password from FILE
     --list-only             list the files instead of copying them
     --bwlimit=KBPS          limit I/O bandwidth; KBytes per second
     --write-batch=FILE      write a batched update to FILE
     --only-write-batch=FILE like --write-batch but w/o updating destination
     --read-batch=FILE       read a batched update from FILE
     --protocol=NUM          force an older protocol version to be used
     --iconv=CONVERT_SPEC    request charset conversion of filenames
     --qnap-mode=mode        0:Normal, 1:QRAID1, 2:USB copy 3:HD copy USB
     --check-dest             Check if the destination path is valid
     --password=WORD         the password of QNAP mode
     --sever-mode=mode       0:Normal, 1:QNAP in daemon-mode
     --schedule=name         specify the schedule name
 -4, --ipv4                  prefer IPv4
 -6, --ipv6                  prefer IPv6
     --version               print version number
(-h) --help                  show this help (-h works with no other options)

Use "rsync --daemon --help" to see the daemon-mode command-line options.
Please see the rsync(1) and rsyncd.conf(5) man pages for full documentation.
See http://rsync.samba.org/ for updates, bug reports, and answers


To compare directories, simply install "diffutils" via Optware:

Installation:

Code: Select all

ipkg update ; ipkg install diffutils


Options:

Code: Select all

Usage: /opt/bin/diff [OPTION]... FILES
Compare FILES line by line.

Mandatory arguments to long options are mandatory for short options too.
      --normal                  output a normal diff (the default)
  -q, --brief                   report only when files differ
  -s, --report-identical-files  report when two files are the same
  -c, -C NUM, --context[=NUM]   output NUM (default 3) lines of copied context
  -u, -U NUM, --unified[=NUM]   output NUM (default 3) lines of unified context
  -e, --ed                      output an ed script
  -n, --rcs                     output an RCS format diff
  -y, --side-by-side            output in two columns
  -W, --width=NUM               output at most NUM (default 130) print columns
      --left-column             output only the left column of common lines
      --suppress-common-lines   do not output common lines

  -p, --show-c-function         show which C function each change is in
  -F, --show-function-line=RE   show the most recent line matching RE
      --label LABEL             use LABEL instead of file name
                                  (can be repeated)

  -t, --expand-tabs             expand tabs to spaces in output
  -T, --initial-tab             make tabs line up by prepending a tab
      --tabsize=NUM             tab stops every NUM (default 8) print columns
      --suppress-blank-empty    suppress space or tab before empty output lines
  -l, --paginate                pass output through `pr' to paginate it

  -r, --recursive                 recursively compare any subdirectories found
  -N, --new-file                  treat absent files as empty
      --unidirectional-new-file   treat absent first files as empty
      --ignore-file-name-case     ignore case when comparing file names
      --no-ignore-file-name-case  consider case when comparing file names
  -x, --exclude=PAT               exclude files that match PAT
  -X, --exclude-from=FILE         exclude files that match any pattern in FILE
  -S, --starting-file=FILE        start with FILE when comparing directories
      --from-file=FILE1           compare FILE1 to all operands;
                                    FILE1 can be a directory
      --to-file=FILE2             compare all operands to FILE2;
                                    FILE2 can be a directory

  -i, --ignore-case               ignore case differences in file contents
  -E, --ignore-tab-expansion      ignore changes due to tab expansion
  -b, --ignore-space-change       ignore changes in the amount of white space
  -w, --ignore-all-space          ignore all white space
  -B, --ignore-blank-lines        ignore changes whose lines are all blank
  -I, --ignore-matching-lines=RE  ignore changes whose lines all match RE

  -a, --text                      treat all files as text
      --strip-trailing-cr         strip trailing carriage return on input

  -D, --ifdef=NAME                output merged file with `#ifdef NAME' diffs
      --GTYPE-group-format=GFMT   format GTYPE input groups with GFMT
      --line-format=LFMT          format all input lines with LFMT
      --LTYPE-line-format=LFMT    format LTYPE input lines with LFMT
    These format options provide fine-grained control over the output
      of diff, generalizing -D/--ifdef.
    LTYPE is `old', `new', or `unchanged'.  GTYPE is LTYPE or `changed'.
    GFMT (only) may contain:
      %<  lines from FILE1
      %>  lines from FILE2
      %=  lines common to FILE1 and FILE2
      %[-][WIDTH][.[PREC]]{doxX}LETTER  printf-style spec for LETTER
        LETTERs are as follows for new group, lower case for old group:
          F  first line number
          L  last line number
          N  number of lines = L-F+1
          E  F-1
          M  L+1
      %(A=B?T:E)  if A equals B then T else E
    LFMT (only) may contain:
      %L  contents of line
      %l  contents of line, excluding any trailing newline
      %[-][WIDTH][.[PREC]]{doxX}n  printf-style spec for input line number
    Both GFMT and LFMT may contain:
      %%  %
      %c'C'  the single character C
      %c'\OOO'  the character with octal code OOO
      C    the character C (other characters represent themselves)

  -d, --minimal            try hard to find a smaller set of changes
      --horizon-lines=NUM  keep NUM lines of the common prefix and suffix
      --speed-large-files  assume large files and many scattered small changes

      --help               display this help and exit
  -v, --version            output version information and exit

FILES are `FILE1 FILE2' or `DIR1 DIR2' or `DIR FILE...' or `FILE... DIR'.
If --from-file or --to-file is given, there are no restrictions on FILE(s).
If a FILE is `-', read standard input.
Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.

Report bugs to: bug-diffutils@gnu.org
GNU diffutils home page: <http://www.gnu.org/software/diffutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>


Example:

Code: Select all

/opt/bin/diff /share/MyFolder /share/USBDisk1



For this effort, I actually think installing "findutils" via Optware would be more useful than "diffutils". "Findutils" provides both the "find" and "xargs" commands:

Code: Select all

/opt/bin/find --help
Usage: /opt/bin/find [path...] [expression]

default path is the current directory; default expression is -print
expression may consist of: operators, options, tests, and actions:

operators (decreasing precedence; -and is implicit where no others are given):
      ( EXPR )   ! EXPR   -not EXPR   EXPR1 -a EXPR2   EXPR1 -and EXPR2
      EXPR1 -o EXPR2   EXPR1 -or EXPR2   EXPR1 , EXPR2

positional options (always true): -daystart -follow -regextype

normal options (always true, specified before other expressions):
      -depth --help -maxdepth LEVELS -mindepth LEVELS -mount -noleaf
      --version -xdev -ignore_readdir_race -noignore_readdir_race

tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N
      -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group NAME
      -ilname PATTERN -iname PATTERN -inum N -iwholename PATTERN -iregex PATTERN
      -links N -lname PATTERN -mmin N -mtime N -name PATTERN -newer FILE
      -nouser -nogroup -path PATTERN -perm [+-]MODE -regex PATTERN
      -wholename PATTERN -size N[bcwkMG] -true -type [bcdpflsD] -uid N
      -used N -user NAME -xtype [bcdpfls]

actions: -delete -print0 -printf FORMAT -fprintf FILE FORMAT -print
      -fprint0 FILE -fprint FILE -ls -fls FILE -prune -quit
      -exec COMMAND ; -exec COMMAND {} + -ok COMMAND ;
      -execdir COMMAND ; -execdir COMMAND {} + -okdir COMMAND ;

Report (and track progress on fixing) bugs via the findutils bug-reporting
page at http://savannah.gnu.org/ or, if you have no web access, by sending
email to <bug-findutils@gnu.org>.


and

Code: Select all

/opt/bin/xargs --help
Usage: /opt/bin/xargs [-0prtx] [--interactive] [--null] [-d|--delimiter=delim]
       [-E eof-str] [-e[eof-str]]  [--eof[=eof-str]]
       [-L max-lines] [-l[max-lines]] [--max-lines[=max-lines]]
       [-I replace-str] [-i[replace-str]] [--replace[=replace-str]]
       [-n max-args] [--max-args=max-args]
       [-s max-chars] [--max-chars=max-chars]
       [-P max-procs]  [--max-procs=max-procs] [--show-limits]
       [--verbose] [--exit] [--no-run-if-empty] [--arg-file=file]
       [--version] [--help] [command [initial-arguments]]

Report bugs to <bug-findutils@gnu.org>.


Google is your friend, as far as learning how to use these commands.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu Feb 12, 2015 7:46 am
by smitty123
+1

torrent is so bad at remembering what's already been downloaded. it would be cool to have such an app run in the box.

something that would be simple, and its based off the media server app, when it runs it sets up a database of video,images and music files, why not extend that functionality to all files just keeping their basic info like path and md5 or crc value of each file we drop in the box and then we can run a report app to find duplicates that way.

With a proper interface for managing files on the nas like a good dupe finder on the web for windows, it would make finding and deleting dupes much simpler. we'd need to see the file to make sure its the right one.

The point would be to have the app run natively in the box rather than install some program and use the ethernet port/router to do this. it may not be fast but i prefer to do it all inside the box.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu Apr 30, 2015 8:37 pm
by thinkandbuild
I'm astounded that in this day and age, a command line method is the best available, and even then the two suspected folders need to be identified. What I envisage as being a very useful app would be an indexer that "knows" what is on the NAS at any point in time, and compares all additions to the NAS with that list, perhaps then tagging files as likely duplicates.

Short of that, a scan could be performed as required to find such duplicates, across ALL folders, rather than assuming you can guess where duplicates might exist.

One common source of duplicates for me was video files from a digital video camera. I would copy all video files form the DVC to my main PC without deleting them, and then do the same to a new folder a few weeks later, and again and again. I freed up a TB by finding and deleting these by making a copy to a local HDD and searching it using built in Windows tools.

On my old TS412, that process was days of work. My new 453Pro makes it much quicker, but I would happily pay decent money for a ANS based solution that didn't require me to learn UNIX and practice the dark art of command line and switch selection and placement.

Who's with me?

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu Apr 30, 2015 10:37 pm
by giopas
This link seems nice as well (yet to test to be honest): http://www.techrepublic.com/blog/linux- ... ting-time/

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu May 07, 2015 2:29 pm
by parkerc
Thanks for the suggestions.

My personal preference is still to see this feature as part of the GUI, maybe even an enhancement/extension to File Station.

Either run it on demand, or as a constant background task when ever new files are added - checking a combination of file names, types and sizes to fine duplicates (maybe hash values too?) would be a great optimisation feature, ensuring you then make full use of your valuable storage...

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu May 07, 2015 4:11 pm
by giopas
I think that this would be difficult and in any case you will never reach the level of customisation that a script can provide

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Fri May 08, 2015 3:38 am
by smitty123
yep, i think we're on our own with this one.

so, if anyone knows a good program for that,

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Fri May 08, 2015 3:08 pm
by parkerc
I understand from reading online that..

On Synology there are storage analysis reports that can be created and run automatically at scheduled time and work very well, they return among other information duplicate files candidates,grouped by duplicate and ordered by size, larger duplicate files first, with the possibility to inspect the file(s) or delete one of the versions from the report itself.


It's a shame such a feature could not be implemented in qnap?

So in the absence of that, there's a Wikipedia page for pretty much everything these days..

http://en.m.wikipedia.org/wiki/List_of_ ... le_finders

But which one to choose ?
Would be interesting one was suitable to made into qpkg?

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Fri May 08, 2015 5:14 pm
by giopas
Dupeguru seems interesting as it has a GUI, but this could also be its main problem.

Other tools seem fine as well of course... it just depends on the specific needs, checks to be done and possibility to run them on our NAS.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Thu Aug 13, 2015 2:34 pm
by parkerc
Wouldn't it be great if QNAP were able to add such a facility to their next 4.2 firmware release..

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Fri Aug 14, 2015 4:39 am
by smitty123
yeppers, but i'm not counting on that. especially if they only code for x86 cpus.

but until then i found this file compare to use : http://www.duplicatecleaner.com/

its got a boatload of options but isn't complicated. one of which is to scan for duplicates with just file size and/or similar file names,
there's of course byte to byte but if you want to spare your network the load, just by size is fine. you can review each file before you delete them.

there's even an image compare , tho i haven't used it yet (only avail in pro version). but it can do image comparison by "looking" at it there's a % selector you can use to tell it how close a match they have to be.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Fri Jan 08, 2016 9:31 pm
by giopas
I don't know if above tool is real or a spam, but now (starting from QTS v. 4.2.1) with Linux-Station it should be as easy as typing (from Linux-Station terminal), respectively:

for Standard Edition:

Code: Select all

~# sudo apt-get install dupeguru-se

for Music Edition:

Code: Select all

~# sudo apt-get install dupeguru-me

for Picture Edition:

Code: Select all

~# sudo apt-get install dupeguru-pe

I have personally used this tool (on Windows connecting to my NAS) on several Gb of pictures and it worked.

The advantage of using Linux-Station is that you can launch the program, leave it work and then checking the results from your TV or via VNC from your PC.

Re: Duplicate File Finder (File Comparrison, File Compare)

Posted: Sat Jan 09, 2016 6:26 pm
by smitty123
it's real, but your suggestion is interesting

looking at https://www.qnap.com/i/useng/app_center ... jump_win=1 it says linux-station needs 4gb ram.

so looks like it won't work on my nas (arm, 256mb ram).

still hoping someone at Qnap will port this in QTS since it's also a linux base.