cancel
Showing results for 
Search instead for 
Did you mean: 

storage savings by using enum

analyst_tech_jo
New Contributor
I want to calculate the storage savings by using enums. Using the example below. Some assumptions on storage of datatypes

list:`abc`def`ghi`abc`def  / 5 elements * 3 char/ element * 1byte / char = 15 bytes
uniql:(?)list / 3 elements *  3 chars /element * 1 byte / chars = 9bytes 
enum:`uniql?list / 5 elements * 1 short / element * 1byte / short = 5 bytes

Total saving : 15 bytes - 9 bytes - 5 bytes = 1 byte

Question:
How many bytes to store a symbol ?
How many bytes to store a short / int / long ?
Is the calculation right ?
4 REPLIES 4

Jonathon_McMurr
New Contributor

Hi

 

There�s a slight problem with your thinking here. You can�t save an unenumerated symbol list. Even when in memory, a list of symbols is actually enumerated. As you add symbols, these are added to an internal symbol list and stored as a pointer to the entry in this list. (You can see the number of entries in this list and the memory occupied by this list as the last two entries in the return dictionary of .Q.w[]).

 

This means that regardless of the length of a sym, the sym will be stored once, and on repeated use it will occupy the memory for the pointer, not the actual sym (i.e. 5 syms of length 3 will not take up 15 bytes)

 

When saving a database to disk, symbols must be enumerated also.

 

The other option (for unenumerated storage) would be to save as strings � this will result in saving two files, one a list of pointers (so number of elements * length of pointer), the other containing the actual data (which will increase as string size increases).

 

Assuming the vector is repetitive, it is a good candidate for being of type sym, and this is how it should be stored. However, you should note that compression is not the purpose of enumeration. It can be a by-product in some cases, but it�s not designed for that. However, kdb does have built in compression support � see here for more details:http://code.kx.com/wiki/Cookbook/FileCompression

 

Hope that helps

Jonathon

 

 

From: analyst
Sent: 25 November 2016 15:43
To: Kdb+ Personal Developers
Subject: [personal kdb+] storage savings by using enum

 

I want to calculate the storage savings by using enums. Using the example below. Some assumptions on storage of datatypes

list:`abc`def`ghi`abc`def  / 5 elements * 3 char/ element * 1byte / char = 15 bytes
uniql:(?)list / 3 elements *  3 chars /element * 1 byte / chars = 9bytes 
enum:`uniql?list / 5 elements * 1 short / element * 1byte / short = 5 bytes

Total saving : 15 bytes - 9 bytes - 5 bytes = 1 byte

Question:
How many bytes to store a symbol ?
How many bytes to store a short / int / long ?
Is the calculation right ?

--
You received this message because you are subscribed to the Google Groups "Kdb+ Personal Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email topersonal-kdbplus+unsubscribe@googlegroups.com.
To post to this group, send email to personal-kdbplus@googlegroups.com.
Visit this group at https://groups.google.com/group/personal-kdbplus.
For more options, visit https://groups.google.com/d/optout.

This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of AquaQ Analytics may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system.

Hi,

1. I realize in my first post the number of bytes taken by data types might not be right. I looked up the wiki for the right sizes.

2. If I may ask, I could not find any direct documentation for the following.  "Even when in memory, a list of symbols is actually enumerated." . What you are you saying makes sense but I am approaching it based on documentation. I don't see it anywhere. Can you point me somewhere it does show it ?

>list of symbols

in a liberal sense of 'enumeration', a list of symbols could be said to be enumerated over a large domain from zero to the value of the largest pointer to the symbol table.

in a kdb sense, a list of symbols is not enumerated.  the enumeration of a list of symbols, v is:
  e:`u$v where u:distinct v

e mostly behaves in the same way as v, except you can cast e to a number and see the normalization:
q)u:distinct v:`a`b`a`a
q)e:`u$v
q)"i"$e
0 1 0 0i

as you can see in this example, the space needed for each value is just one bit, but history seems to indicate that kdb+ uses an int vector (4*n byte) to store e.

>storage
hopefully you only need to calculate storage of lists - all other things like headers and symbol table being small in comparison.

How many bytes to store a symbol ?
it's a pointer, so 4 or 8 bytes * n

>How many bytes to store a short / int / long ?
2, 4 and 8 bytes * n

[atoms are 16 bytes]

  e:`u$v where u:distinct v

i meant: the enumeration of a list of symbols, v is e where:
  u:distinct v
  e:`u$v