Traverse directories and files recursively
Today, I decided to blog about this because it was a painful experience working with this. Below is my completed script.
************************************
************************************
#!/bin/bash
#set -x
traverse()
{
ls -A "$3" | while read i
# for i in "$3"/*;
do
# echo "$3/$i"
if [ -f "$3/$i" ]; then
# echo "Traversing File: $3/$i"
bash -c ""$2" \""$3"/"$i"\""
# echo " "
else
# echo "Traversing Directory: $3/$i"
bash -c ""$1" \""$3"/"$i"\""
#traverse recursively if not empty
if [ "$(ls -A $3/$i)" ]; then
traverse "$1" "$2" "$3/$i"
fi
fi
done
}
SAVEIFS=$IFS
IFS=$'\n'
if [ -z "$1" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
if [ -z "$2" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
if [ -z "$3" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
# bash -c ""$1" "$3""
traverse "$1" "$2" "$3"
fi
fi
fi
IFS=$SAVEIFS
************************************
************************************
Here is the simple manual
"traverse [map to dirs] [map to files] [directory or file]"
My script will traverse directory tree (root is the third argument), apply a function (first argument) to directories and another function (second argument) to files.
There are so many things needing special care
1) Choosing what kind of script
I initially used "#!/bin/sh"; however, I got many unexpected errors and I was wondering what they were. That changing to "#!/bin/bash" can so the question is unclear to me, but it works out.
As a consequence, this is an example how to run it
bash traverse.sh "ls" "ls" ~
2) How to traverse a directory.
In my script, I have 2 options:
ls -A "$3" | while read i
or
for i in "$3"/*;
The second option can not work because it can not recognize hidden directories and file.
Then, I need to differentiate between files and folders using
if [ -f "$3/$i" ]; then
#FILE
else
# DIRECTORY
if [ "$(ls -A $3/$i)" ]; then
#IF DIRECTORY is not EMPTY
traverse "$1" "$2" "$3/$i"
fi
A list of options for checking directories and files can be found at http://www.comp.leeds.ac.uk/hannah/linux/shell.html
3) How to traverse a directory correctly
It depends on what kinds of files and folder I am considering. In my case, I want all files including the hidden directory. This can be done with "ls -a" option. However, this will also list current and parent directories which will cause a infinite traversal. Therefore, "ls -A" will list hidden items but not the current and parent dirs. Additionally, I have to check if the child directory is empty or not before deciding to traverse into it.
4) Dealing with white spaces in filenames
This is the most trickiest things during my scripting. After searching painfully, I found the solution.
First, I have to add this
SAVEIFS=$IFS
IFS=....
MY SCRIPT
....
IFS=$SAVEIFS
This will make "ls -A "$3" | while read i" work correctly. For each while iteration, i is the full name of a file or a directory including spaces. Without modifying $IFS, filenames would not be read correctly.
Secondly, to represent a file with spaces correctly on a command, we should but it in a single quote or double quote. That's why we have a quite complex script at
bash -c ""$2" \""$3"/"$i"\""
5) Single quotes vs double quotes
There are plentiful articles discussing about this. This is what I wanna stress: "variable names are expanded within double quotes, but not single quotes." More details at http://www.panix.com/~elflord/unix/bash-tute.html
6) Dealing with the unknown
Why are they unknown? Because I don't know why my solutions solved the problems.
First, adding "#!/bin/bash" as explained above.
Second, adding "bash -c" before my command.
bash -c ""$2" \""$3"/"$i"\""
Removing "bash -c" will generate some unknown errors.
Third, sometimes we should pay attention to use the full path for certain commands such /usr/bin/time ...
Hope this is helpful to avoid painful future for those needing this script.
************************************
************************************
#!/bin/bash
#set -x
traverse()
{
ls -A "$3" | while read i
# for i in "$3"/*;
do
# echo "$3/$i"
if [ -f "$3/$i" ]; then
# echo "Traversing File: $3/$i"
bash -c ""$2" \""$3"/"$i"\""
# echo " "
else
# echo "Traversing Directory: $3/$i"
bash -c ""$1" \""$3"/"$i"\""
#traverse recursively if not empty
if [ "$(ls -A $3/$i)" ]; then
traverse "$1" "$2" "$3/$i"
fi
fi
done
}
SAVEIFS=$IFS
IFS=$'\n'
if [ -z "$1" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
if [ -z "$2" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
if [ -z "$3" ]; then
"traverse [map to dirs] [map to files] [directory or file]"
else
# bash -c ""$1" "$3""
traverse "$1" "$2" "$3"
fi
fi
fi
IFS=$SAVEIFS
************************************
************************************
Here is the simple manual
"traverse [map to dirs] [map to files] [directory or file]"
My script will traverse directory tree (root is the third argument), apply a function (first argument) to directories and another function (second argument) to files.
There are so many things needing special care
1) Choosing what kind of script
I initially used "#!/bin/sh"; however, I got many unexpected errors and I was wondering what they were. That changing to "#!/bin/bash" can so the question is unclear to me, but it works out.
As a consequence, this is an example how to run it
bash traverse.sh "ls" "ls" ~
2) How to traverse a directory.
In my script, I have 2 options:
ls -A "$3" | while read i
or
for i in "$3"/*;
The second option can not work because it can not recognize hidden directories and file.
Then, I need to differentiate between files and folders using
if [ -f "$3/$i" ]; then
#FILE
else
# DIRECTORY
if [ "$(ls -A $3/$i)" ]; then
#IF DIRECTORY is not EMPTY
traverse "$1" "$2" "$3/$i"
fi
A list of options for checking directories and files can be found at http://www.comp.leeds.ac.uk/hannah/linux/shell.html
3) How to traverse a directory correctly
It depends on what kinds of files and folder I am considering. In my case, I want all files including the hidden directory. This can be done with "ls -a" option. However, this will also list current and parent directories which will cause a infinite traversal. Therefore, "ls -A" will list hidden items but not the current and parent dirs. Additionally, I have to check if the child directory is empty or not before deciding to traverse into it.
4) Dealing with white spaces in filenames
This is the most trickiest things during my scripting. After searching painfully, I found the solution.
First, I have to add this
SAVEIFS=$IFS
IFS=....
MY SCRIPT
....
IFS=$SAVEIFS
This will make "ls -A "$3" | while read i" work correctly. For each while iteration, i is the full name of a file or a directory including spaces. Without modifying $IFS, filenames would not be read correctly.
Secondly, to represent a file with spaces correctly on a command, we should but it in a single quote or double quote. That's why we have a quite complex script at
bash -c ""$2" \""$3"/"$i"\""
5) Single quotes vs double quotes
There are plentiful articles discussing about this. This is what I wanna stress: "variable names are expanded within double quotes, but not single quotes." More details at http://www.panix.com/~elflord/unix/bash-tute.html
6) Dealing with the unknown
Why are they unknown? Because I don't know why my solutions solved the problems.
First, adding "#!/bin/bash" as explained above.
Second, adding "bash -c" before my command.
bash -c ""$2" \""$3"/"$i"\""
Removing "bash -c" will generate some unknown errors.
Third, sometimes we should pay attention to use the full path for certain commands such /usr/bin/time ...
Hope this is helpful to avoid painful future for those needing this script.
2 Comments:
This comment has been removed by the author.
You need to use the full path when the environment variables are not the same for all accounts.
If your path is in the result of "export | grep PATH" then no-need to use the full path.
But this actually is a painful experience. The environment of different users are different. It works fine when we test, but it can't run on the real system (with another user environments) due to missing environment variables. Especially, when we can't login into that user (apache... for eg)
Post a Comment
<< Home