Tuesday 7 May 2024

Fortran binary output -> Python

Since I don't really do anything with my Raspberry Pi, this blog "My Pi's Mad At Me" may as well become a more general computing blog.  And since I have spent a bit of time lately working something out, I thought I'd make a post about it.

I have a moderately large physics code (Sky3d, published here (v1.0) and here (v1.1)) which produces output that can be usefully post-processed to derive further physically interesting results.  These output files are, at the moment, written in the standard unformatted Fortran binary format, which means they can be easily read in by a Fortran code and further processed.  Unlike using, say, a text-based formatted output, there is no loss in precision from the reading and writing process when using unformatted output, though it is not human-readable, and it is the natural thing to do.

Since I now make use of Python for smallish coding jobs as much as I do of Fortran, I wanted to make sure I could read these binary files into Python, and started to look in to how to do it.  Fortran is a relatively small language in which I can, or at least could when I was using it daily, keep most of it in my mind, or know quickly where to look up.  Python is a sprawling archipelago of ideas, and ways of doing things, and sort-of-standard libraries, which is great in letting you do lots of high-level stuff, but hard for anyone not fully immersed in it to know how best to do something, especially when relying on third-party libraries.  So, my first search gave me a method within the very standard SciPy library which could read Fortran binary files, but the documentation cautions that "Since this is a non-standard file format, whose contents depend on the compiler and the endianness of the machine, caution is advised". 

So I rummanged around a bit more and decided to go to the Fortran end of things and re-write the output in a more portable way using stream i/o, which came in to Fortran in the 2003 standard, and which "allows easy interoperability with C binary stream" according to my copy of "Modern Fortran Explained".

I therefore wrote a Fortran code which fairly simply opened an existing records-based Fortran unformatted binary file for reading, and wrote out each variable one-by-one to a new file opened as a stream-based unformatted binary file. 

To see if it worked, I had to read it in to a Python program.  Here, there seem many possible options, and I went for the 'struct' library, partly as it seems to be inside the Python language and so doesn't necessitate downloading and loading and keeping track of versions of other libraries (e.g. numpy.fromfile probably does the same job, but why bring in a large library like numpy if you don't have to?)

So, for example, a binary write of an integer in Fortran like 

write(11) iter

is read in Python as

iter = struct.unpack('i',f.read(4))

where the 'i' means integer and the 4 means 4 bytes.  f is the file handle that has been opened in Python, while the 11  in the Fortran line is the unit number of the file when written out in the Fortran code.

It works nicely, but I was scratching my head for a while when dealing with what Fortran calls logical variables and in Python are booleans.  My Fortran code had been using, and writing out to file Fortran's default logical type, which turns out on my implementation of Fortran to be 4 bytes long, encoded in a slightly strange way.  Consider the following Fortran program                                               
 

program logical_test
  integer :: u
  logical :: l1=.true., l2=.true.
  open(newunit=u,file='data.stream',form='unformatted',access='stream')

  write(u) l1, l2
end program logical_test
 

which writes out two default logical variables of value .true..  When run, the resulting file data.stream is 8 bytes long and looks like this:

[~] od -x data.stream                                                 
0000000      0001    0000    0001    0000                              
0000010

so there are two sets of 4 bytes, which both look like 00 01 00 00 (in hex), and the interesting information is encoded in the second byte of 4.  Well, a weird choice, and one that can only be read in the Python method above if you know exactly how this Fortran data type is encoded, and know to read in the padding bytes and throw them away.  That's not what I want at all, and before understanding this padding I was confused why other variables printed out after the logical variable appeared to be read in as garbage.

My initial solution was to not use logical/boolean data types at all, but to encode the information as integers which I knew how to encode and transfer in the binary files.  Then I looked up the types (as Fortran calls them) in the intrinsic iso_c_binding module, and there is one called c_bool.  If I use that, then it gives a 1-byte representation and I can easily read that using the Python struct library that I showed above:

Now my test Fortran program reads:

program logical_test
  use iso_c_binding
 
  integer :: u
  logical(c_bool) :: l1=.true., l2=.true.
  open(newunit=u,file='data.stream',form='unformatted',access='stream')

  write(u) l1, l2
end program logical_test

and my data.stream file is 2 bytes long:

[~] od -x data.stream
0000000     0101                                                       

0000002

and I can read it in Python as

import struct

f=open('data.stream','rb')
print(struct.unpack('?',f.read(1))[0])
print(struct.unpack('?',f.read(1))[0])

and the result is

[~] python logical.py
True
True

and now numerical data coming after that can be read safely and I don't need to worry about where in an 4-byte string a logical value is being encoded.