cancel
Showing results for 
Search instead for 
Did you mean: 

Need help to read hdf. file written in python to kdb+?

MSHK
New Contributor III

Hi Community,

I am using the https://code.kx.com/q/interfaces/hdf5/ and trying to read HDF file (written in python) in kdb+ (q)?

Attached HDF file and error message.

Please advise how to read this file?

 
File from iOS
Download Binary

rgds,

Marion

 

11 REPLIES 11

SJT
Valued Contributor
Valued Contributor

The error message JPEG looks like source code; I don’t see an HDF file attached. Did I miss something?

 

MSHK
New Contributor III

Hi, 

How can I attach a (.hdf) file in this community?

The below error message says the valid file types are jpg, gif, png, json and txt only.

please advise.

Unless I can email it to you from my @first derivatives.com account?

 

rgds,

Marion

SJT
Valued Contributor
Valued Contributor

Try appending e.g. file.hdf as file.hdf.txt ?

MSHK
New Contributor III

Hi,

Please find the file attached as test.hdf.txt for your review.

rgds,

Marion

MSHK
New Contributor III

Hi,

I tried to attach a .txt file but the format is not allowed to attach.

Thus, I screen grabbed the .hdf file in png for your review here.

How to send a .hdf or .txt file here? It is not allowed, sorry!

MSHK
New Contributor III

Hi,

I tried again just now and managed to attach the file in txt.

Can you please have a look and advise back?

Thank you so much!

Hi MSHK - there does not seem to be any txt file attached. Only PNG files (real PNG contents - rename to hdf is invalid)

MSHK
New Contributor III

Hi,

Thanks for checking.

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

MSHK
New Contributor III

There was an error to attach .txt file, thus after I type the message, the file could not be attached and was removed. How to send the sample hdf file to you?

rocuinneagain
Contributor III
Contributor III

If you run through the example in q then inspect the created file in python you will see what the interface expects

https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset 

The table columns are stored individually inside groups

 

>>>data = h5.File('experiments.h5', 'r')
>>> data['experiment2_tables']
<HDF5 group "/experiment2_tables" (1 members)>
>>> data['experiment2_tables/tab_dset']
<HDF5 group "/experiment2_tables/tab_dset" (5 members)>
>>> data['experiment2_tables/tab_dset/class']
<HDF5 dataset "class": shape (10000,), type "<i2">

 

(The filename must end in '.h5')

For you to store data from Python you should match this style using groups for columns.

import h5py as h5
import pandas as pd

df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]})


f = h5.File('forKX.h5','w')
project = f.create_group("project")
table = project.create_group("table")
for col in df.columns:
 table[col] = df[col].to_numpy()

f.close()

kdb+ still does not know you intend this data to be a table.

As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

Attributes would be needed.

Without the attributes you can reshape in to a table like so:

q){flip x!{.hdf5.readData["forKX.h5";"project/table/",string x]} each x}`AA`BB`CC
AA BB CC
--------
1  3  5
2  4  6

 

 

 

This code creates a basic table in a file written by KX:

 

t:([] AA:1 2;BB:3 4;CC:5 6)

.hdf5.createFile["byKX.h5"]
.hdf5.createGroup["byKX.h5";"project"]
.hdf5.writeData[fname;"project/table";t]

 

If we expand out what it is doing to match the documentation we can create the exact same file with:

https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries 

 

t:([] AA:1 2;BB:3 4;CC:5 6)

.hdf5.createFile["diy.h5"]
.hdf5.createGroup["diy.h5";"project"]
.hdf5.createGroup["diy.h5";"project/table"]
{.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t
.hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"]
.hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]

 

Finally this would be the python equivalent:

 

import h5py as h5
import pandas as pd
import numpy as np 

df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]})


f = h5.File('forKX.h5','w')
project = f.create_group("project")
table = project.create_group("table")
table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8)
table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns]
for col in df.columns:
 table[col] = df[col].to_numpy()

f.close()

 

All three read in the same way:

 

q).hdf5.readData["byKX.h5";"project/table"]
AA BB CC
--------
1  3  5
2  4  6
q).hdf5.readData["diy.h5";"project/table"]
AA BB CC
--------
1  3  5
2  4  6
q).hdf5.readData["forKX.h5";"project/table"]
AA BB CC
--------
1  3  5
2  4  6

 

 

h5dump is useful to inspect h5 files.

https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm 

Used it will print the shapes and types of the contents of your file

 

h5dump diy.h5

 

 

 

The main 3 takeaways are:

1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping 

2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example

3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+