KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ Need help to read hdf. file written in python to kdb+?

  • Need help to read hdf. file written in python to kdb+?

    Posted by mshk on May 30, 2022 at 12:00 am

    Hi Community,

    I am using the https://code.kx.com/q/interfaces/hdf5/ and trying to read HDF file (written in python) in kdb+ (q)?

    Attached HDF file and error message.

    Please advise how to read this file?

    File from iOS

    Download Binary

     

    rgds,

    Marion

     

    mshk replied 1 week, 5 days ago 3 Members · 3 Replies
  • 3 Replies
  • Laura

    Administrator
    May 30, 2022 at 12:00 am

    The error message JPEG looks like source code; I dont see an HDF file attached. Did I miss something?

     

  • rocuinneagain

    Member
    June 2, 2022 at 12:00 am

    If you run through the example in q then inspect the created file in python you will see what the interface expects

    https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset

    The table columns are stored individually inside groups

     

    >>>data = h5.File('experiments.h5', 'r') 
    >>> data['experiment2_tables'] <HDF5 group "/experiment2_tables" (1 members)> 
    >>> data['experiment2_tables/tab_dset'] <HDF5 group "/experiment2_tables/tab_dset" (5 members)> 
    >>> data['experiment2_tables/tab_dset/class'] <HDF5 dataset "class": shape (10000,), type "<i2">

     

    (The filename must end in ‘.h5’)

    For you to store data from Python you should match this style using groups for columns.

    import h5py as h5 
    import pandas as pd 
    df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) 
    f = h5.File('forKX.h5','w') 
    project = f.create_group("project") 
    table = project.create_group("table") for col in df.columns: table[col] = df[col].to_numpy() 
    f.close()

    kdb+ still does not know you intend this data to be a table.

    As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

    Attributes would be needed.

    Without the attributes you can reshape in to a table like so:

    q){flip x!{.hdf5.readData["forKX.h5";"project/table/",string x]} each x}`AA`BB`CC 
    AA BB CC 
    -------- 
    1 3 5 
    2 4 6

     

     

     

  • rocuinneagain

    Member
    June 2, 2022 at 12:00 am

    This code creates a basic table in a file written by KX:

     

    t:([] AA:1 2;BB:3 4;CC:5 6) 
    .hdf5.createFile["byKX.h5"] 
    .hdf5.createGroup["byKX.h5";"project"] 
    .hdf5.writeData[fname;"project/table";t]

     

    If we expand out what it is doing to match the documentation we can create the exact same file with:

    https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

     

    t:([] AA:1 2;BB:3 4;CC:5 6) 
    .hdf5.createFile["diy.h5"] 
    .hdf5.createGroup["diy.h5";"project"] 
    .hdf5.createGroup["diy.h5";"project/table"] 
    {.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t 
    .hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"] 
    .hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]

     

    Finally this would be the python equivalent:

     

    import h5py as h5 
    import pandas as pd 
    import numpy as np 
    df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) 
    f = h5.File('forKX.h5','w') 
    project = f.create_group("project") 
    table = project.create_group("table") 
    table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8) 
    table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() 
    f.close()

     

    All three read in the same way:

     

    q).hdf5.readData[“byKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“diy.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6

     

     

    h5dump is useful to inspect h5 files.

    https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm

    Used it will print the shapes and types of the contents of your file

     

    h5dump diy.h5

     

    The main 3 takeaways are:

    1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping

    2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example

    3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+

     

Log in to reply.