vendredi 31 juillet 2015

Python "numpy.dtype" structure for reading binary to "list" with "numpy.fromfile"

+++ WARNING, THE FOLLOWING CONTAINS VERY UGLY PROGRAMMING +++

+++ PLEASE HELP!!! +++

Hey, I am playing around quit a long time with my read in routines and I still not have figured out a good and fast way!

I have something like this: A huge binary file, which I want to slice down to a numpy-array!

I created this structure to read in fromfile a certain amount of bytes:

    mydt = numpy.dtype([
                       ('col1', np.uint64),
                       ('col2', np.int32),
                       ('cols3_56', np.float32, (53,))
                       ])

reading that like this:

data_block = numpy.fromfile(openfile, dtype=mydt, count=ntimes)

What I am getting out is something like this:

[(88000031189210L, 1, [-1000.0, -1000.0, -1000.0, -2.0, -2.0, -2.0, 65004000.0, 0.0, 760680000.0, 0.0, 0.12124349921941757, 0.04971266910433769, 2328.39990234375, 0.00013795999984722584, 0.0, 0.0, -1.0, -1.0, -1.0, 65004000.0, -1.0, 760680000.0, 0.0, 0.0, -1.0, 825680000.0, 0.0, -1.0, -1.0, -1.0, 157630.0, 0.0, 756310.0, 0.0, -1.0, -1.0, 0.0, 5.250500202178955, 0.0, 5.250500202178955, -13.602999687194824, -16.760000228881836, -17.283000946044922, -16.95800018310547, -17.513999938964844, -17.57200050354004, -13.657999992370605, -16.77199935913086, -17.291000366210938, -16.9689998626709, -17.520999908447266, -17.57200050354004, 1.0]), [(88......1L, 1, [-1000.0, ....]), ....

then I extend this datablock to my array

data_block_array.extend(data_block)

... and this million of times ....

I want now to access two things:

  • the 2th element in the above structure (in this example "1") for the entire data array which is a couple of millions times the above mentioned array
  • the 8th (in total the 12th) element in the 53-column data block for the entire array, again millions of substructures!

I figured that out with doing some loops over a count:

 i=0           
 while i<count:
     self.data_array[i,element1] = data_block_array[i][1]
     self.data_array[i,element8] = data_block_array[i][2][13]  

which is incredible slow ... I would like to develop a very fast and easy way to filter my data that way and extract the columns I am interested in. Appreciate some advise and insights!

Aucun commentaire:

Enregistrer un commentaire