Discussion:
Files being changed to size zero bytes
Murray Strome
2011-02-27 14:43:14 UTC
Permalink
On one computer running Kubuntu 10.04, I am finding that a lot of files
have been changed to be of size zero bytes. Some time ago, I asked for
help with a script to replace zero byte files with the version on a DVD
to which I had written these files. At that time, all the bad files had
a single date associated with them. However, this morning, I noticed
many other files have been changed to zero bytes.

Does anyone have any idea as to why this is happening? The dates of the
changed files now have no particular pattern. More importantly, what
should I do to prevent this from continuing to occur?

Thanks for any suggestions.

Murray
Alan W. Irwin
2011-02-27 17:06:56 UTC
Permalink
On one computer running Kubuntu 10.04, I am finding that a lot of files have
been changed to be of size zero bytes. Some time ago, I asked for help with
a script to replace zero byte files with the version on a DVD to which I had
written these files. At that time, all the bad files had a single date
associated with them. However, this morning, I noticed many other files
have been changed to zero bytes.
Does anyone have any idea as to why this is happening? The dates of the
changed files now have no particular pattern. More importantly, what should I
do to prevent this from continuing to occur?
Thanks for any suggestions.
Are you using the ext4 filesystem, and have there been some power
outages since you last dealt with the zero-length file problem?

If so, I suspect you are running into the problem described at
http://ezinearticles.com/?How-to-Solve-Zero-Length-File-Problem-in-Linuxs-Ext4-File-System?&id=4240780.
See also,
http://en.wikipedia.org/wiki/Ext4#Delayed_allocation_and_potential_data_loss.

My understanding of the issue is a little shaky, but from what I have
read, Linux apps and libraries became too dependent on non-standard
ways to create files under ext3. The result is in the event of a
power loss with ext4, you end up with zero-length files for files
created over the last minute or so before the power loss. And the
number of those zero-length files will increase for each new
power-loss event that occurs. Apparently, the issue can also happen
with ext3, but it is much less likely because the allocation delay is
a lot less.

According to the above Wikipedia article, kernel version 2.6.30 got a
fix that substantially reduced the likelihood of creating zero-length
files for ext4 filesystems. This fix was backported to earlier
kernels by some Linux distributions. 'For instance Ubuntu made them
part of the 2.6.28 kernel in version 9.04 ("Jaunty Jackalope").' But
from the 10.04 version of kubuntu you mentioned above, it appears you
already have those kernel fixes. So maybe the fixes (which are
advertised to use ext3-like quick allocation in selected cases for
ext4) are not as reliable as using ext3 in the first place? If that
is the case, switching your filesystems back to ext3 might be the
answer, but that is no sure thing.

I have read articles about this Linux filesystem issue for a long time
now, and from your experience it is still not resolved so I think you
can expect it will take quite a long time to completely solve it.
Meanwhile, I would advise working around it using an uninterruptible
power supply (UPS) for your computer to insure there are no power
outages. A UPS is probably a good idea in any case because computer's
tend to last longer (at least that is my experience compared to others
here it town) if they are protected with a UPS. Anyhow, because
my computers are protected by UPS's I have never seen the zero-length
file issue.

I have been using two simple Back-UPS's from APC since 1996 (and 2001)
without issues other than the occasional required battery change. I
speculate that APC have now discontinued selling those popular units
because they were too reliable. They, instead, now want to sell you
complicated UPS's where there is a lot more that can go wrong. I
suggest it would be a good idea to beat that game by buying
reconditioned APC Back-UPS's from independent suppliers. The fact
that such a reconditioned APC Back-UPS market exists is another
indication of just how popular those units have been.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
Murray Strome
2011-02-27 21:54:41 UTC
Permalink
Post by yudi santoso
Hi Murray,
Is this the same system that you had problem with before? Are the files
regularly worked on (e.g. write, move, sync etc) ?
Yudi
Post by Murray Strome
On one computer running Kubuntu 10.04, I am finding that a lot of
files have been changed to be of size zero bytes. Some time ago, I
asked for help with a script to replace zero byte files with the
version on a DVD to which I had written these files. At that time,
all the bad files had a single date associated with them. However,
this morning, I noticed many other files have been changed to zero
bytes.
Does anyone have any idea as to why this is happening? The dates of
the changed files now have no particular pattern. More importantly,
what should I do to prevent this from continuing to occur?
Thanks for any suggestions.
Are you using the ext4 filesystem, and have there been some power
outages since you last dealt with the zero-length file problem?
If so, I suspect you are running into the problem described at
http://ezinearticles.com/?How-to-Solve-Zero-Length-File-Problem-in-Linuxs-Ext4-File-System?&id=4240780.
See also,
http://en.wikipedia.org/wiki/Ext4#Delayed_allocation_and_potential_data_loss.
My understanding of the issue is a little shaky, but from what I have
read, Linux apps and libraries became too dependent on non-standard
ways to create files under ext3. The result is in the event of a
power loss with ext4, you end up with zero-length files for files
created over the last minute or so before the power loss. And the
number of those zero-length files will increase for each new
power-loss event that occurs. Apparently, the issue can also happen
with ext3, but it is much less likely because the allocation delay is
a lot less.
According to the above Wikipedia article, kernel version 2.6.30 got a
fix that substantially reduced the likelihood of creating zero-length
files for ext4 filesystems. This fix was backported to earlier
kernels by some Linux distributions. 'For instance Ubuntu made them
part of the 2.6.28 kernel in version 9.04 ("Jaunty Jackalope").' But
from the 10.04 version of kubuntu you mentioned above, it appears you
already have those kernel fixes. So maybe the fixes (which are
advertised to use ext3-like quick allocation in selected cases for
ext4) are not as reliable as using ext3 in the first place? If that
is the case, switching your filesystems back to ext3 might be the
answer, but that is no sure thing.
I have read articles about this Linux filesystem issue for a long time
now, and from your experience it is still not resolved so I think you
can expect it will take quite a long time to completely solve it.
Meanwhile, I would advise working around it using an uninterruptible
power supply (UPS) for your computer to insure there are no power
outages. A UPS is probably a good idea in any case because computer's
tend to last longer (at least that is my experience compared to others
here it town) if they are protected with a UPS. Anyhow, because
my computers are protected by UPS's I have never seen the zero-length
file issue.
I have been using two simple Back-UPS's from APC since 1996 (and 2001)
without issues other than the occasional required battery change. I
speculate that APC have now discontinued selling those popular units
because they were too reliable. They, instead, now want to sell you
complicated UPS's where there is a lot more that can go wrong. I
suggest it would be a good idea to beat that game by buying
reconditioned APC Back-UPS's from independent suppliers. The fact
that such a reconditioned APC Back-UPS market exists is another
indication of just how popular those units have been.
Alan
__________________________
Thanks for you replies. Yes Yudi, it is the same file system. The latest
files for which I noticed the problem have not been worked on for quite
a while, but they could have been copied to DVD or external drive.

Alan: thank you very much for your information. Yes, the file system is
ext4 and I had not heard of the problem you mentioned before. I cannot
remember if I had any power failures recently, but it is possible. Also,
I believe that at least a couple of times in the past year, I have had
the system freeze in a manner which forced me to do a hardware reset,
which would be the equivalent of a power failure. I have not noticed the
problem on my other computers, but then I might have gone another six
months without noticing anything was wrong, except that I found a backup
DVD from last year and decided to do a search in my home directory for
files of 0 length. Also, very few, if any of the files in question were
being "worked on" at the time of any such power outage.

From the articles you referenced, it sounds like this could the cause
of my problems. Hopefully, I will be able to recover most of those files
(most are backed up somewhere, but I will have to make sure the backups
are clear of those errors).

The UPS is a good idea. Before I retired, at work I usually had UPS
protection and the units I used then would send a signal to the computer
to tell it to shut down safely. Another thing I always used to do on
UNIX machines was rsync at least a couple of times before shutting down.
I kind of assumed that this happens automatically with the newer systems
(but of course, not in a power failure situation).

In the meantime, can you suggest any free data recovery software which
addresses this issue that I might try? The only thing I could find was
Stellar Phoenix Linux Data Recovery Software, which requires MS Windows
(the download is .ext) and costs $80 US. They have a free evaluation
version, but from what I read, that doesn't actually fix anything. It
just tells you what the paid version /could/ fix. I assume the
downloaded program would create a bootable CD.

Murray
Alan W. Irwin
2011-02-27 22:50:35 UTC
Permalink
Alan: [...]
From the articles you referenced, it sounds like this could the cause of my
problems. Hopefully, I will be able to recover most of those files (most are
backed up somewhere, but I will have to make sure the backups are clear of
those errors).
The UPS is a good idea. Before I retired, at work I usually had UPS
protection and the units I used then would send a signal to the computer to
tell it to shut down safely.
Hi Murray:

The Debian (and probably Ubuntu) package called apcupsd handles such
details for APC UPS's. I must say, though, that I have gotten really
lazy about using that since virtually all power glitches I experience
are short term (so the battery can hold the system without any
need to shutdown).

You also mentioned you have been hitting the reset button.
Obviously, in light of the zero-length file problem you should try to
avoid that as much as possible. For example, if would be much better
to ssh into the system and run the shutdown command for a smooth
shutdown if at all possible.

Could you publish (again?) your script for finding zero-length files?
I must say I have also been absolutely forced to hit the reset button
myself once or maybe twice in the last few years. So I should look
for such files myself. However, I also use ext3 so it may not be an
issue for me.
In the meantime, can you suggest any free data recovery software which
addresses this issue that I might try? The only thing I could find was
Stellar Phoenix Linux Data Recovery Software, which requires MS Windows (the
download is .ext) and costs $80 US. They have a free evaluation version, but
from what I read, that doesn't actually fix anything. It just tells you what
the paid version /could/ fix. I assume the downloaded program would create a
bootable CD.
There is obviously nothing you can recover for new files that were
scheduled to be written from memory to disk when the power went out. I
suppose it is possible you could buy software to allow you to recover
the old file for the case where an old file was being updated when the
power went out producing a zero-length result. But I don't think the
normal Linux filesystem tools have this option so I am not sure it is
possible. Probably the best approach is to keep good backups and
check for zero-length files after every reset/power outage, and
restore those files from backup.

BTW, my knowledge of the zero-length file issue is limited to just
what I read from those two articles so ideally some filesystem expert
lurking here with some practical experience with the issue will chime
in with the definitive way (perhaps a mount option?) to always assure
reliable results (either the old file or the new file with no
zero-length files) for the ext4 and ext3 cases. Given the choice
between speed and reliability, I will take reliability every time
since finding zero-length files and restoring from backup is such a
pain after every reset/power outage.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
Murray Strome
2011-02-28 00:09:28 UTC
Permalink
Post by Alan W. Irwin
Alan: [...]
From the articles you referenced, it sounds like this could the cause
of my problems. Hopefully, I will be able to recover most of those
files (most are backed up somewhere, but I will have to make sure the
backups are clear of those errors).
The UPS is a good idea. Before I retired, at work I usually had UPS
protection and the units I used then would send a signal to the
computer to tell it to shut down safely.
The Debian (and probably Ubuntu) package called apcupsd handles such
details for APC UPS's. I must say, though, that I have gotten really
lazy about using that since virtually all power glitches I experience
are short term (so the battery can hold the system without any
need to shutdown).
You also mentioned you have been hitting the reset button.
Obviously, in light of the zero-length file problem you should try to
avoid that as much as possible. For example, if would be much better
to ssh into the system and run the shutdown command for a smooth
shutdown if at all possible.
Could you publish (again?) your script for finding zero-length files?
I must say I have also been absolutely forced to hit the reset button
myself once or maybe twice in the last few years. So I should look
for such files myself. However, I also use ext3 so it may not be an
issue for me.
I simply use the "Find" tool in Konqueror (I install it even if I am
using Gnome). I leave the file name field blank and under properties, I
select the option "At Least", put "1" in the size location and change
the units to "Bytes". For most things, I prefer Konqueror to Dolphin or
Nautilus.
Post by Alan W. Irwin
In the meantime, can you suggest any free data recovery software
which addresses this issue that I might try? The only thing I could
find was Stellar Phoenix Linux Data Recovery Software, which requires
MS Windows (the download is .ext) and costs $80 US. They have a free
evaluation version, but from what I read, that doesn't actually fix
anything. It just tells you what the paid version /could/ fix. I
assume the downloaded program would create a bootable CD.
There is obviously nothing you can recover for new files that were
scheduled to be written from memory to disk when the power went out. I
suppose it is possible you could buy software to allow you to recover
the old file for the case where an old file was being updated when the
power went out producing a zero-length result. But I don't think the
normal Linux filesystem tools have this option so I am not sure it is
possible. Probably the best approach is to keep good backups and
check for zero-length files after every reset/power outage, and
restore those files from backup.
I really think there must be more to the problem, as I cannot believe
that so many files (730) would have been in the process of being
written/updated at the few times of any power outage. There would have
been at most 4 times this would have happened in the past year or so.
Most importantly, a lot of those files have become zero bytes in length
since October 2010 when I first noticed this problem. At that time, all
of the zero byte files had a single date associated with them
(2010-08-04). Now they are all over the map, 2005-01-19 to 2010-08-18,
excluding a bunch of .??? (system?) files, which may be legitimate and
come up to today.
Post by Alan W. Irwin
BTW, my knowledge of the zero-length file issue is limited to just
what I read from those two articles so ideally some filesystem expert
lurking here with some practical experience with the issue will chime
in with the definitive way (perhaps a mount option?) to always assure
reliable results (either the old file or the new file with no
zero-length files) for the ext4 and ext3 cases. Given the choice
between speed and reliability, I will take reliability every time
since finding zero-length files and restoring from backup is such a
pain after every reset/power outage.
Alan
I would also prefer reliability to speed. I think I would have stuck with
ext3 if I had had any idea of this problem in ext4. I will now have to
look at our other computers to see if they have problems as well. The
only real difference now is that the other ones all have a variation of
Ubuntu rather than Kubuntu. Perhaps I should do a clean install on this
computer and use ext3 instead of ext4.

Murray
yudi santoso
2011-02-28 02:09:56 UTC
Permalink
Hi Murray,
I was thinking that recurring problem on the same computer system could
point to hardware problem. Faulty memory, hard drive or power supply could
lead to file corruption, though usually not zero byte. But I think it make
sense with the delayed allocation feature of ext4 as pointed out by Alan if
this is related to power interruption.

I'm not sure how ext4 do the time stamp (whether also delayed or not), but
could you check the last modification time of the files? (Using ls -lt). One
of the date you mention below is 2005. I don't think you have already used
ext4 in 2005.

Yudi
Post by Murray Strome
Alan: [...]
From the articles you referenced, it sounds like this could the cause of
my problems. Hopefully, I will be able to recover most of those files (most
are backed up somewhere, but I will have to make sure the backups are clear
of those errors).
The UPS is a good idea. Before I retired, at work I usually had UPS
protection and the units I used then would send a signal to the computer to
tell it to shut down safely.
The Debian (and probably Ubuntu) package called apcupsd handles such
details for APC UPS's. I must say, though, that I have gotten really
lazy about using that since virtually all power glitches I experience
are short term (so the battery can hold the system without any
need to shutdown).
You also mentioned you have been hitting the reset button.
Obviously, in light of the zero-length file problem you should try to
avoid that as much as possible. For example, if would be much better
to ssh into the system and run the shutdown command for a smooth
shutdown if at all possible.
Could you publish (again?) your script for finding zero-length files? I
must say I have also been absolutely forced to hit the reset button
myself once or maybe twice in the last few years. So I should look
for such files myself. However, I also use ext3 so it may not be an
issue for me.
I simply use the "Find" tool in Konqueror (I install it even if I am
using Gnome). I leave the file name field blank and under properties, I
select the option "At Least", put "1" in the size location and change the
units to "Bytes". For most things, I prefer Konqueror to Dolphin or
Nautilus.
In the meantime, can you suggest any free data recovery software which
addresses this issue that I might try? The only thing I could find was
Stellar Phoenix Linux Data Recovery Software, which requires MS Windows (the
download is .ext) and costs $80 US. They have a free evaluation version, but
from what I read, that doesn't actually fix anything. It just tells you what
the paid version /could/ fix. I assume the downloaded program would create a
bootable CD.
There is obviously nothing you can recover for new files that were
scheduled to be written from memory to disk when the power went out. I
suppose it is possible you could buy software to allow you to recover
the old file for the case where an old file was being updated when the
power went out producing a zero-length result. But I don't think the
normal Linux filesystem tools have this option so I am not sure it is
possible. Probably the best approach is to keep good backups and
check for zero-length files after every reset/power outage, and
restore those files from backup.
I really think there must be more to the problem, as I cannot believe that
so many files (730) would have been in the process of being written/updated
at the few times of any power outage. There would have been at most 4 times
this would have happened in the past year or so. Most importantly, a lot of
those files have become zero bytes in length since October 2010 when I first
noticed this problem. At that time, all of the zero byte files had a single
date associated with them (2010-08-04). Now they are all over the map,
2005-01-19 to 2010-08-18, excluding a bunch of .??? (system?) files, which
may be legitimate and come up to today.
BTW, my knowledge of the zero-length file issue is limited to just
what I read from those two articles so ideally some filesystem expert
lurking here with some practical experience with the issue will chime
in with the definitive way (perhaps a mount option?) to always assure
reliable results (either the old file or the new file with no
zero-length files) for the ext4 and ext3 cases. Given the choice
between speed and reliability, I will take reliability every time
since finding zero-length files and restoring from backup is such a
pain after every reset/power outage.
Alan
I would also prefer reliability to speed. I think I would have stuck with
ext3 if I had had any idea of this problem in ext4. I will now have to
look at our other computers to see if they have problems as well. The
only real difference now is that the other ones all have a variation of
Ubuntu rather than Kubuntu. Perhaps I should do a clean install on this
computer and use ext3 instead of ext4.
Murray
_______________________________________________
Discuss mailing list
http://ladybug.vlug.org/cgi-bin/mailman/listinfo/discuss
Michael Foltinek
2011-03-02 01:52:07 UTC
Permalink
In the meantime, can you suggest any free data recovery software which addresses this issue that I might try? The only thing I could find was Stellar Phoenix Linux Data Recovery Software, which requires MS Windows (the download is .ext) and costs $80 US. They have a free evaluation version, but from what I read, that doesn't actually fix anything. It just tells you what the paid version /could/ fix. I assume the downloaded program would create a bootable CD.
Murray
I'd have to second Cy's suggestion of tools you'll find in the Autopsy
toolkit. If I had more information I might be able to suggest
something more specific, but my experience is mostly with commercial
forensics tools that run on Windows.


--
True compassion is more than flinging a coin at a beggar; it comes to
see that an edifice which produces beggars needs restructuring.
    - Dr. Martin Luther King Jr.

Murray Strome
2011-03-01 04:39:40 UTC
Permalink
Post by Alan W. Irwin
According to the above Wikipedia article, kernel version 2.6.30 got a
fix that substantially reduced the likelihood of creating zero-length
files for ext4 filesystems. This fix was backported to earlier
kernels by some Linux distributions. 'For instance Ubuntu made them
part of the 2.6.28 kernel in version 9.04 ("Jaunty Jackalope").' But
from the 10.04 version of kubuntu you mentioned above, it appears you
already have those kernel fixes. So maybe the fixes (which are
advertised to use ext3-like quick allocation in selected cases for
ext4) are not as reliable as using ext3 in the first place? If that
is the case, switching your filesystems back to ext3 might be the
answer, but that is no sure thing.
I have read articles about this Linux filesystem issue for a long time
now, and from your experience it is still not resolved so I think you
can expect it will take quite a long time to completely solve it.
Meanwhile, I would advise working around it using an uninterruptible
power supply (UPS) for your computer to insure there are no power
outages. A UPS is probably a good idea in any case because computer's
tend to last longer (at least that is my experience compared to others
here it town) if they are protected with a UPS. Anyhow, because
my computers are protected by UPS's I have never seen the zero-length
file issue.
I have been looking more closely at the files which have been set to
zero length, and I now think that my problem was not likely caused by a
power outage (or hardware reset) after all.

For the past while, whenever I have travelled, I have either copied the
files that I was actively using to an external hard drive or
synchronized those files using Unison. Then I have synchronized those to
my laptop or notebook before I left. When I returned home, I did the
reverse, i.e. used Unison to synchronize the files to the external drive
then from that to my desktop. I believe (but cannot be certain) that
files which were never copied this way are unaffected (whether they have
been changed on the desktop system or not). I think it is only a few of
the files (well a couple of hundred out of many thousands) which I have
transferred back and forth.

For the files I felt I might need on my wife's computer, I simply made a
DVD containing those since I normally would not be changing them, just
reading them (e.g. all her recipes). I don't think any of her files have
zero bytes, even though her computer has had similar power outages.

When I have copied files back to my desktop, I have most often just used:

cp -R -u --preserve=timestamps [directory on external drive] [directory
on desktop]

This seemed to work quite well, since only files that I had changed
while away would be copied back.

I now suspect that either there is something wrong with Unison, or less
likely, with cp. Since I first noticed the problem a long time after I
did any of these operations, I cannot be certain whether this is the key
or not. From the discussions on ext4 (which I had never even considered
looking at before), it would appear that there may be quite a few
applications which do not properly account for the long allocation delay
with ext4.

Very few of the files which are affected would have been in the process
of being modified at the time of any power outage. It will take me quite
a bit of checking to be certain, but I suspect that all of the affected
files may have been within the directories that were copied back and forth.

One side benefit of this has been that it has encouraged me to look at
those files and to cull a lot that I really don't need to keep. I have
already deleted over 2/3 of the ones with zero bytes which I have no
good reason to keep or try to recover, plus many, many others in the
same sub-directories as those affected!!

I will probably go back to ext3 as Alan has suggested. However, from
what I have read, I think that means copying everything to another drive
(at least the home directory), then doing a fresh install. This, of
course, will mean going through the painful process of reinstalling all
my third-party stuff and getting all the non-standard repositories
again. I am going to watch my list of zero byte files for a few days to
see if there are any changes before doing this. If not, it is not too
long till Ubuntu 11.04 is due to be released. If I want to upgrade to
that, I will have to go through all that pain anyway.

If it turns out NOT to be a power outage issue, the UPC would not have
helped. I do agree that it is a good idea, though.

Murray
Alan W. Irwin
2011-03-01 04:57:28 UTC
Permalink
Post by Murray Strome
I now suspect that either there is something wrong with Unison, or less
likely, with cp. Since I first noticed the problem a long time after I did
any of these operations, I cannot be certain whether this is the key or not.
One thing you can do to make sure copying is being done correctly is
to systematically and recursively use md5sum on all the files that are
copied. That is use "find -type f" to find all the files in a
directory tree that is going to be copied, and then run md5sum on that
file list.

Then after the copy is completed, do md5sum -c to make sure all files
in the copied directory tree have the same checksum. I routinely use
this method when copying CD's from library books (where it is legal to
do so such as the CD's in the back of many SF books from Baen) or when
doing extensive tests of (say) a new drive or new filesystem. It gives
a huge feeling of confidence that all copying is working correctly.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
Viorel Tabara
2011-03-01 05:13:59 UTC
Permalink
On Mon, 28 Feb 2011 20:57:28 -0800 (PST), "Alan W. Irwin"
Post by Alan W. Irwin
One thing you can do to make sure copying is being done correctly is
to systematically and recursively use md5sum on all the files that are
copied. That is use "find -type f" to find all the files in a
directory tree that is going to be copied, and then run md5sum on that
file list.
Then after the copy is completed, do md5sum -c to make sure all files
in the copied directory tree have the same checksum. I routinely use
this method when copying CD's from library books (where it is legal to
do so such as the CD's in the back of many SF books from Baen) or when
doing extensive tests of (say) a new drive or new filesystem. It gives
a huge feeling of confidence that all copying is working correctly.
You can also look into rsync's '--checksum' option. I've found 'rsync' a lot
more powerful than 'cp'.
Alan W. Irwin
2011-03-01 09:02:13 UTC
Permalink
Post by Viorel Tabara
On Mon, 28 Feb 2011 20:57:28 -0800 (PST), "Alan W. Irwin"
Post by Alan W. Irwin
One thing you can do to make sure copying is being done correctly is
to systematically and recursively use md5sum on all the files that are
copied. That is use "find -type f" to find all the files in a
directory tree that is going to be copied, and then run md5sum on that
file list.
Then after the copy is completed, do md5sum -c to make sure all files
in the copied directory tree have the same checksum. I routinely use
this method when copying CD's from library books (where it is legal to
do so such as the CD's in the back of many SF books from Baen) or when
doing extensive tests of (say) a new drive or new filesystem. It gives
a huge feeling of confidence that all copying is working correctly.
You can also look into rsync's '--checksum' option. I've found 'rsync' a lot
more powerful than 'cp'.
Good point. I agree rsync is certainly a powerful and useful piece of
software. I double-checked the --checksum part of the rsync manpage,
and it appears rsync does automatic checksums for each file it
transfers regardless of whether you use the --checksum option or not.
Thus, rsync automatically does something like what I suggested above
on first copy.

Note what the --checksum option actually does is make the algorithm
for deciding if a file is different (and thus needs transferring)
between sender and destination exact. However, that option is _much_
slower than the quick but inexact default method which is to check
dates and sizes of the sender and destination files to decide what has
to be transferred. So I would suggest using --checksum only if you
are unsure about long-term storage reliability, e.g., if you want to
check whether bits are getting randomly flipped in your source or
backup disk between rsync runs. If such flips occurred it would not
change the size or date of either the source or destination file and
default rsync would simply ignore such "flipped" files. But the
--checksum option (say combined with --dry-run) after a default run to
backup a filesystem would identify all remaining file pairs with
identical sizes and dates but differing bits. In the past, I have
used this method to prove to my satisfaction that file pairs between
an external disk that was rsync-mirrored to an internal disk had no
bit errors, thus confirming the reliability of both the internal and
external disks. But --check-sum took a very long time to complete
so I only did this check one time.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________
Lionel Widdifield
2011-03-01 07:16:33 UTC
Permalink
Post by Murray Strome
I will probably go back to ext3 as Alan has suggested. However, from
what I have read, I think that means copying everything to another drive
(at least the home directory), then doing a fresh install. This, of
course, will mean going through the painful process of reinstalling all
my third-party stuff and getting all the non-standard repositories
again.
While it is far from simple (ie partition magic) it can and is done
regularly it just requires some planning, having done it a few score of times
helps as well.


To change a filesystem, you simply reformat it with the new desired
filesystem, AFTER having backed up the original contents and confirmed they
are correct, then restore the files. Having a spare drive makes this easy,
having RIPLinux setup to boot via PXE on a network make for a eureka moment.

RIPLinux boot
mount drive, tar files to storage.
unmount drive
reformat
untar files from storage.
verify boot partition iff involved.
--
Lionel Widdifield
Murray Strome
2011-03-01 14:29:11 UTC
Permalink
Adding information to the puzzle:

After examining a lot of the affected files, it turns out that most are
in a directory, call it ~/Home/Directory_A.

It turns out that the for the vast majority of those files in the root
of Directory_A, they have the same name as a file
within one of Directory_A's subdirectories which is NOT of zero length.

Thus, it appears to me that somehow, zero byte length files have been
created with the names of some of
the "real" files in the sub-directories. I will have to investigate
further to see if there are any exceptions to this.
However, having eliminated the zero byte files corresponding to ones I
have found so far in subdirectories, I have
gone from over 700 when I first started this exercise to about 20 left
to check! Some of the remaining are symbolic links to drives which are
no longer mounted.

Thus, the problem is nowhere near as disastrous as I first thought.

Murray
Murray Strome
2011-03-02 00:05:24 UTC
Permalink
Post by Murray Strome
After examining a lot of the affected files, it turns out that most
are in a directory, call it ~/Home/Directory_A.
It turns out that the for the vast majority of those files in the root
of Directory_A, they have the same name as a file
within one of Directory_A's subdirectories which is NOT of zero length.
Thus, it appears to me that somehow, zero byte length files have been
created with the names of some of
the "real" files in the sub-directories. I will have to investigate
further to see if there are any exceptions to this.
However, having eliminated the zero byte files corresponding to ones I
have found so far in subdirectories, I have
gone from over 700 when I first started this exercise to about 20 left
to check! Some of the remaining are symbolic links to drives which are
no longer mounted.
Thus, the problem is nowhere near as disastrous as I first thought.
Murray
Now that "all the dust has settled", it turns out that I have lost only
three files (which I may still be able to locate on a backup HD or DVD).
With that small number, I could easily have accidentally deleted them at
some point in time.

Thanks to everyone who contributed to the discussion and who gave
valuable suggestions and advice.

Murray
yudi santoso
2011-02-27 17:07:43 UTC
Permalink
Hi Murray,
Is this the same system that you had problem with before? Are the files
regularly worked on (e.g. write, move, sync etc) ?

Yudi
Post by Murray Strome
On one computer running Kubuntu 10.04, I am finding that a lot of files
have been changed to be of size zero bytes. Some time ago, I asked for help
with a script to replace zero byte files with the version on a DVD to which
I had written these files. At that time, all the bad files had a single
date associated with them. However, this morning, I noticed many other
files have been changed to zero bytes.
Does anyone have any idea as to why this is happening? The dates of the
changed files now have no particular pattern. More importantly, what should
I do to prevent this from continuing to occur?
Thanks for any suggestions.
Murray
_______________________________________________
Discuss mailing list
http://ladybug.vlug.org/cgi-bin/mailman/listinfo/discuss
from Saanichton BC
2011-02-27 16:50:40 UTC
Permalink
Hi Murray,

What file system are you using? The first thing I thought of when I read
this was that a fsk was done with the partition mounted, but that was just a
guess. How your partitions are laid out and in which these files are might
also help.

Chris
Post by Murray Strome
On one computer running Kubuntu 10.04, I am finding that a lot of files
have been changed to be of size zero bytes. Some time ago, I asked for help
with a script to replace zero byte files with the version on a DVD to which
I had written these files. At that time, all the bad files had a single
date associated with them. However, this morning, I noticed many other
files have been changed to zero bytes.
Does anyone have any idea as to why this is happening? The dates of the
changed files now have no particular pattern. More importantly, what should
I do to prevent this from continuing to occur?
Thanks for any suggestions.
Murray
_______________________________________________
Discuss mailing list
http://ladybug.vlug.org/cgi-bin/mailman/listinfo/discuss
Continue reading on narkive:
Loading...