Golang: File Tree Traversal (filepath.Walk)
In this article we’ll see how to walk the file system with golang. We’ll see:
- a simple example of
filepath.Walk
- how to pass arguments to
filepath.WalkFunc
- how to find file duplicates
- a
du
implementation
filepath.Walk
As a simple example of filepath.Walk
we’ll list all the files under a directory recursively (simple.go):
package main
import (
"fmt"
"log"
"os"
"path/filepath"
)
func printFile(path string, info os.FileInfo, err error) error {
if err != nil {
.Print(err)
logreturn nil
}
.Println(path)
fmtreturn nil
}
func main() {
.SetFlags(log.Lshortfile)
log:= os.Args[1]
dir := filepath.Walk(dir, printFile)
err if err != nil {
.Fatal(err)
log}
}
We set the log
flags to Lshortfile
to better spot errors when they happen. Everything else is explained
very well in the go
docs.
Run it with:
$ go run simple.go .
Passing arguments
We can pass arguments to filepath.WalkFunc
trought a closure.
printFile
now doesn’t process the files directly but
returns a closure that does the work. The closure can access the
arguments we pass to printFile
as if they were local
variables. For example to ignore some directories we can pass a list of
said directories to printFile
and whenever the closure
finds a directory whose name is inside the list it will skip it by
returning os.SkipDir
(ignore.go
):
package main
import (
"fmt"
"log"
"os"
"path/filepath"
)
func printFile(ignoreDirs []string) filepath.WalkFunc {
return func(path string, info os.FileInfo, err error) error {
if err != nil {
.Print(err)
logreturn nil
}
if info.IsDir() {
:= filepath.Base(path)
dir for _, d := range ignoreDirs {
if d == dir {
return filepath.SkipDir
}
}
}
.Println(path)
fmtreturn nil
}
}
func main() {
.SetFlags(log.Lshortfile)
log:= os.Args[1]
dir := []string{".bzr", ".hg", ".git"}
ignoreDirs := filepath.Walk(dir, printFile(ignoreDirs))
err if err != nil {
.Fatal(err)
log}
}
Find file duplicates
For a more realistic application we’ll write a program that will find
all the file duplicates under a directory. For each file we’ll store its
crypto/sha512
digest inside a map. If the digest was already present, the file is a
duplicate, otherwise we store its path using the digest as a key (fdup.go
):
package main
import (
"crypto/sha512"
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
)
var files = make(map[[sha512.Size]byte]string)
func checkDuplicate(path string, info os.FileInfo, err error) error {
if err != nil {
.Print(err)
logreturn nil
}
if info.IsDir() {
return nil
}
, err := ioutil.ReadFile(path)
dataif err != nil {
.Print(err)
logreturn nil
}
:= sha512.Sum512(data)
digest if v, ok := files[digest]; ok {
.Printf("%q is a duplicate of %q\n", path, v)
fmt} else {
[digest] = path
files}
return nil
}
func main() {
.SetFlags(log.Lshortfile)
log:= os.Args[1]
dir := filepath.Walk(dir, checkDuplicate)
err if err != nil {
.Fatal(err)
log}
}
Let’s try it
In a terminal run:
$ mkdir test
$ cd test
$ echo 'run free, run GNU' > gnu
$ echo 'from outer space' > plan9
$ cp gnu free
$ cp plan9 outer
$ ls
free gnu outer plan9
$ cd ..
$ go run fdup.go test
"test/gnu" is a duplicate of "test/free"
"test/plan9" is a duplicate of "test/outer"
du
Despite filepath.Walk
’s
usefulness it can not model all type of programs, one such program is
du
. Starting with one directory, du
reports
the cumulative size of the given directory and all its subdirectories
recursively. The entries of a directory are read with os.Readdir
(du.go):
package main
import (
"fmt"
"log"
"os"
)
func du(currentPath string, info os.FileInfo) int64 {
:= info.Size()
size if !info.IsDir() {
return size
}
, err := os.Open(currentPath)
dirif err != nil {
.Print(err)
logreturn size
}
defer dir.Close()
, err := dir.Readdir(-1)
fisif err != nil {
.Fatal(err)
log}
for _, fi := range fis {
if fi.Name() == "." || fi.Name() == ".." {
continue
}
+= du(currentPath+"/"+fi.Name(), fi)
size }
.Printf("%d %s\n", size, currentPath)
fmt
return size
}
func main() {
.SetFlags(log.Lshortfile)
log:= os.Args[1]
dir , err := os.Lstat(dir)
infoif err != nil {
.Fatal(err)
log}
(dir, info)
du}
Get the source code
- List files
- List files ignoring some dirs
- Find duplicates
- du implementation
Golang Weekly
Checkout Go Weekly for the latest articles, tutorials and projects about Go.