Avoid modifying scalars connected to string filehandles

Since Perl 5.8, you can treat a string as a file (Item 54. Open filehandles to and from strings). You can open a filehandle, read from the string, write to the string, and most of the other things that you can do with a file. There are some gotchas though, when you deal with that string as a normal string and a filehandle at the same time. We’ve filed this as RT 78980: Odd behavior when string filehandles and scalar assignment collide.

Start by opening a filehandle to a string. Instead of a filename, you use a reference to a string. In this Item, you’ll use a write-filehandle to a string:

my $string = '';
open my $string_fh, '>', \ $string;

You can also do that in one step:

open my $string_fh, '>', \ my $string;

Once you have the filehandle, you can write to it as any other filehandle. Whatever you output is immediately available in the string:

print $string_fh "Buster likes liver treats\n";
print STDOUT $string;

That’s very handy, and in Item 55: Make flexible ouptut, you see how to use that to avoid tricks with capturing output when you really want to keep it inside your program.

If you change the string with normal string operations though, the filehandle portion might not see what you’ve done and things go a little weird. Try printing to the string through the filehandle, then setting the string to the empty string, and finally printing to it again:

use strict;
use warnings;

open my $string_fh, '>', \my $string;
print $string_fh "Buster likes liver treats\n";
print "1. string is [$string]\n";

$string = '';
print "2. string is [$string]\n";

print $string_fh "Mimi";
print "3. string is [$string]\n";

The output for the first two checks on the string look like what you expect, but the output for the last check is odd:

% perl5.12.2 string_fh.pl
1. string is [Buster likes liver treats
]
2. string is []
3. string is [uster likes liver treats
Mimi]

Look closely at line 4, starting 3. string is. It looks like the previous string, Buster ..., has reappeared, but the it also looks like the initial B is missing. That’s seems really odd. Why would only the first character disappear? When you look at the characters though, you see that there is actually a character that you can’t see:

# add this line to the previous program
print join ' ', map { 
	my $o = sprintf "%02X", ord; "[$_:$o]" 
	} split //, $string;

The output shows that there’s a null byte at the beginning of the string:

% perl5.12.2 string_fh.pl
...
[:00] [u:75] [s:73] [t:74] [e:65] [r:72] [ :20] [l:6C] [i:69] [k:6B] 
[e:65] [s:73] [ :20] [l:6C] [i:69] [v:76] [e:65] [r:72] [ :20] [t:74] 
[r:72] [e:65] [a:61] [t:74] [s:73] [
:0A] [M:4D] [i:69] [m:6D] [i:69]

That’s odd. Instead of assigning the empty string, assign something else:

use strict;
use warnings;

open my $string_fh, '>', \my $string;
print $string_fh "Buster likes liver treats\n";
print "1. string is [$string]\n";

$string = 'Roscoe';
print "2. string is [$string]\n";

print $string_fh "Mimi";
print "3. string is [$string]\n";

print join ' ', map { 
	my $o = sprintf "%02X", ord; "[$_:$o]" 
	} split //, $string;

Now the output looks a bit different. The string Roscoe overwrites the beginning of the previous string. Even though that previous output shouldn’t be there, this output almost looks more reasonable. However, when you look closer you see the Roscoe actually overwrote more than its six characters. There’s another null byte in there (perhaps an odd C string effect?):

% perl5.12.2 string_fh.pl
1. string is [Buster likes liver treats
]
2. string is [Roscoe]
3. string is [Roscoelikes liver treats
Mimi]
[R:52] [o:6F] [s:73] [c:63] [o:6F] [e:65] [:00] [l:6C] [i:69] [k:6B] [e:65] [s:73] [ :20] [l:6C] [i:69] [v:76] [e:65] [r:72] [ :20] [t:74] [r:72] [e:65] [a:61] [t:74] [s:73] [
:0A] [M:4D] [i:69] [m:6D] [i:69]

What’s going on? This might be a bug in perl (RT 78980), but it’s the sort that you experience when you do something that you shouldn’t be doing. You affect the string through two different interfaces, and perl does what you tell it, without a warning. It looks like there’s a conflict between the buffer that stores the data that should go into the string. It doesn’t help to unbuffer the filehandle either:

open my $string_fh, '>', \my $string;
my $old_fh = select( $string_fh ); # doesn't help
$|++;
select( $old_fh );

So far, you’ve seen the problem with you make the string shorter than the filehandle thinks it should be. What happens if you go the other way by making the string longer than it should be?

open my $string_fh, '>', \my $string;

print $string_fh "Roscoe";
print "1. string is [$string]\n";

$string = "Buster likes liver treats\n";
print "2. string is [$string]\n";

print $string_fh "Mimi";
print "3. string is [$string]\n";

print join ' ', map { 
	my $o = sprintf "%02X", ord; "[$_:$o]" 
	} split //, $string;

Now you replace Roscoe completely, but when you print to the string filehandle again, it picks up where it left off, which is in the middle of your new string:

1. string is [Roscoe]
2. string is [Buster likes liver treats
]
3. string is [BusterMimies liver treats
]
[B:42] [u:75] [s:73] [t:74] [e:65] [r:72] [M:4D] [i:69] [m:6D] [i:69] [e:65]
[s:73] [ :20] [l:6C] [i:69] [v:76] [e:65] [r:72] [ :20] [t:74] [r:72] [e:65] 
[a:61] [t:74] [s:73] [
:0A]

This time there’s no null byte, but the string is still confused.

So, how do you fix this? There’s this guy who tells his doctor “It hurts when I move my arm” so the doctor says “Well, don’t move your arm”. ProblemSymptom solved. If you have a string that you want to write to through a filehandle, don’t change it through any other way.

However, maybe you want to move your arm even though it hurts. When you want to treat it as a normal string, fix up the filehandle bits at the same time. You might think you can use seek and tell to move around the string (although truncate apparently doesn’t do the right thing at all). The first program changes to:

use Fcntl qw(:seek);

open my $string_fh, '>', \my $string;
print $string_fh "Buster likes liver treats\n";
print "1. string is [$string]\n";

seek $string_fh, 0, SEEK_SET;
$string = '';
print "2. string is [$string]\n";

print $string_fh "Mimi";
print "3. string is [$string]\n";

Now, since you tell the $string_fh to go to the beginning, that’s where it thinks it should start:

% perl5.12.2 seek.pl
1. string is [Buster likes liver treats
]
2. string is []
3. string is [Mimi]

Don’t get too excited though. It works in this case, but what if you don’t seek to the beginning?

use Fcntl qw(:seek);

open my $string_fh, '>', \my $string;
print $string_fh "Buster likes liver treats\n";
print "1. string is [$string]\n";

seek $string_fh, 5, SEEK_SET;
$string = '';
print "2. string is [$string]\n";

print $string_fh "Mimi";
print "3. string is [$string]\n";

Now you’re back to odd output, with a null byte at the beginning and four characters from the previous string:

% perl5.12.2 seek5.pl
1. string is [Buster likes liver treats
]
2. string is []
3. string is [usteMimi]

That output is a bit different than what you might expect if you’d seeked to the same position before you output anything (and don’t otherwise change the scalar) :

use 5.010;
use strict;
use warnings;

use Fcntl qw(:seek);

open my $string_fh, '>', \my $string;
seek $string_fh, 5, SEEK_SET;

print $string_fh "Mimi";
print "0. string is [$string]\n";

It looks almost like it’s right at first glance:

% perl5.12.2 seek52.pl
0. string is [Mimi]

However, when you look more closely you see that there are a bunch of null bytes before Mimi:

% perl5.12.2 seek52.pl | hexdump -C
00000000  30 2e 20 73 74 72 69 6e  67 20 69 73 20 5b 00 00  |0. string is [..|
00000010  00 00 00 4d 69 6d 69 5d  0a                       |...Mimi].|
00000019

So what it you really do want to truncate the string? A string filehandle doesn’t have a real file descriptor:

truncate $string_fh, 0 or warn "Could not truncate: $!\n";

You get the warning:

Could not truncate: Bad file descriptor

So that’s not the right answer, either, so far.

You might think that you could close the filehandle and manipulate it afterward, but how would you know if some other part of the program still expects to print to that filehandle?

Things to remember

  • You can print to a string through a filehandle
  • Don’t manipulate the string behind a string filehandle with the normal string operations
  • Don’t assign to the string behind a string filehandle

One thought on “Avoid modifying scalars connected to string filehandles”

  1. That’s interesting but it’s not surprising that modifying a string doesn’t affect any filehandles connected to it. Making $string = '' do a seek $fh, 0, SEEK_SET would be rather magical. Making non-empty assignments (i.e. longer or shorter values) effect the handle would be incredibly confusing IMHO.

    One thing about this did surprise me a little, and that’s that the print after the second write showed anything at all. Given that Perl is implemented in C I would have expected C behavior: interpreting the leading null as the end of the string.

Comments are closed.