cancel
Showing results for 
Search instead for 
Did you mean: 

Storage architecture for KDB

Brazil
New Contributor

Hi,

I am looking for some advice regarding the storage architecture for KDB. 

We currently have a physical KDB server attached to a storage array which presents 4 disks to Linux and are then combined into a volume group. KDB accesses the volume group and then stores the data spread across the volume group.

The storage array is going end of life and I am now looking at the best way to present the storage from a NetApp array. We have also built a new virtualized VMWare Server.

Presenting the storage through LUN's and Fibre Channel over Ethernet is not an option in this scenario.

Ideally, I'd like to present multiple NFS volumes from the NetApp array and mount the volumes in Linux. The advantage with multiple volumes in NetApp is that Backing up the volume can be done directly from the NetApp array, rather than going through the ESX layer and then to the virtualized Linux host.

I've been reading about using a par.txt file to combine the Linux NFS volumes so that KDB can simply refence a single database. It appears that data is distributed round robin to each volume.

I've got a few questions about the implementation around the par.txt file and whether this is the right approach.

  • Are there any options with how the data is distributed across the different volumes? Does it have to be round robin? Can it sequentially fill the first disk and then on to the second disk.
  • What happens when all volumes are full and a new volume is added to the par.txt file? I presume at this point it won't continue to use round robin as only the newly added volume will have the capacity to add new data?
  • How would I migrate the data from the old KDB server using a volume group to the new KDB server using a par.txt file?
  • If I configure backups to directly backup from the volume on the storage array,  how can I determine which volume the data previously resided on that needs to be restored if the data had been distributed in a round robin method.
  • Are there other alternatives to consider (symlinks) which would be better suited to storage which is being presented as multiple NFS mounts?

I am approaching this more from a storage perspective and have limited KDB knowledge.

Any advice around options would be appreciated.

Thanks,

Ben

 

2 REPLIES 2

Laura
Community Manager Community Manager
Community Manager

Hi Ben,

Thank you for your great question! 

I have reached out to our KX experts, so will hopefully have an answer for you shortly.

Thanks,

Laura

rocuinneagain
Valued Contributor
Valued Contributor

Some inbuilt functions do make assumptions around how data is stored for segmented databases.

The functions assume each date is stored in the segment entry matching modulus of the date by the number of par.txt entries .i.e round robin.

See related thread:

Solved: .Q.par Doesn't Provide the Correct Result in the S... - KX Community - 14220

And warning: https://code.kx.com/q/database/segment/#considerations 

"Partition data correctly: data for a particular date must reside in the partition for that date."

 

However for querying and normal operations where these functions are not called there is no such requirement.  

 

Symlinking is used often in kdb+ systems for flexibility around storage layouts.