Hi Kei,
This is a nice question. I used to use streaming medians for interviews.There are many options depending upon how accurate you need to be, with different memory, accuracy tradeoffs.
At the simplest level, you could just store the median each day, and then calculate the median of medians. This is what KDB used to do to calculate medians over partitions prior to v3.0. This is very simple, but it sacrifices as bit of accuracy.
To be completely accurate we don't need to store every trade, at the very least we need to store the number of trades with each different trade size. i.e to bin the data. Trade sizes are discrete, so this should cut down on things significantly. The median can then be reconstructed by calculating the 30 day rolling sum of the medians, and then finding where the CDF crosses 0.5 cross sectionally.
Trade sizes start at zero, so the size of our bins table could be bulked up by large but rare trades which are unlikely to have an effect on the median. To combat this you could clip the trade sizes at say 90% each day (but still include these trades at the clipped size), which would save some space. It probably wouldn't effect the accuracy, but could introduce some bias, especially as you transition between trading environments e.g. from low to high volatility.
To combat this bias, an alternative would be to use to use batch reservoir sampling, to decide which trades to include each day. This is more advanced, but is likely to give a very accurate answer without any bias.
If space is an absolute premium, and you would only like to store a few values per sym per day, then you could look at a technique called frugal streaming, but I don't think it can be vectorized.
Best,
George