Performance optimization in av::fixup - use buffered IO, not mapped file

Wed Dec 12 17:06:00 GMT 2012

On 12/12/2012 11:58 AM, Corinna Vinschen wrote:
> On Dec 12 09:47, Eric Blake wrote:
>> On 12/12/2012 08:39 AM, Ryan Johnson wrote:
>>
>>> Does gcc/ld/whatever know the final file size before the first write?
>> No, but does it need to?  posix_fallocate() does not change file
>> contents; it merely says that anywhere there was previously a hole must
>> now be guaranteed to be backed by disk.  So gcc would write the file as
>> usual, and then just before close()ing the fd, do a final
>> posix_fallocate(fd, 0, len) with len determined by the final file size.
>>
>>> You have to posix_fallocate the entire file before any write that might
>>> create a hole, because the sparse flag poisons the loader,
>> Is there really a flag stuck into the file when it becomes sparse?
> Yes.  And, as I wrote, you can't remove it pre-Vista.
>
>>> cp --sparse=never $(which emacs-nox) dense
>>> for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
>>> '(kill-emacs)'; done
>>> cp --sparse=never dense sparse
>>> for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
>>> '(kill-emacs)'; done
>>> du dense sparse
>> This doesn't point to a flag in the file, so much as cached information
>> (the file system is remembering that 'sparse' used to be sparse, even if
>> it is no longer sparse).  But your point about a file being cached at
>> some point while it is sparse, even if it is later made non-sparse, is
>> interesting.
>>
>>> The relevant output is:
>>>> sparse
>>>> real    0m1.791s
>>>>
>>>> dense
>>>> real    0m0.606s
>>>>
>>>> sparse
>>>> real    0m3.158s
>>>>
>>>> dense
>>>> real    0m0.081s
>>>>
>>>> 16728   dense
>>>> 16768   sparse
>>> Given that we're talking about cygwin-specific patches for emacs and
>>> binutils anyway, would it be better to add a cygwin-specific fcntl call
>>> that clears the file's sparse flag?
>> What flag is there to clear?  Your cp demonstration showed that even
>> when we do a byte-for-byte copy of every byte (and the file is
>> non-sparse), the file system cache remembers that it used to be sparse.
>>   How do we defeat that file system cache?
> Another question is, is that behaviour reproducible?  Does it happen the
> second time the "new" non-sparse sparse file is called?  You don't even
> know if the slowness is a result of writing the file is still in flight.
> Windows caching can be pretty slow at times, but it recovers quickly
> if a file is used again, usually.
It's painfully reproducible. It takes nearly two hours for a gcc 
bootstrap compiler to configure the various bits of the next stage. It's 
the same for emacs unexec (as OP reported).

I've seen how slow the cache is, it can take up to a minute before du 
reports the actual number of pages in a freshly-copied sparse file. I 
thought cp --sparse=always had a bug at first...

Even after du stabilizes, though, the slow loading persists 
indefinitely. It doesn't matter how many times or how recently the 
binary was last executed, you'll still pay the full cost to pull it off 
disk again, easily confirmed with Resource Monitor (the same file being 
read by umpteen different processes simultaneously).

$ for i in $(seq 20); do time ./sparse -Q --batch --eval '(kill-emacs)'; 
done 2>&1 | grep real | awk '{print $2}'
> 0m1.714s
> 0m1.548s
> 0m1.588s
> 0m1.570s
> 0m1.528s
> 0m1.563s
> 0m1.512s
> 0m1.676s
> 0m1.638s
> 0m1.663s
> 0m1.533s
> 0m1.567s
> 0m1.466s
> 0m1.669s
> 0m1.575s
> 0m1.489s
> 0m1.658s
> 0m1.497s
> 0m1.515s
> 0m1.541s

Ryan

Ryan