Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- KX Community
- :
- Discussion Forums
- :
- Developer Tools
- :
- Distribution of time-series data into train/dev/te...

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Distribution of time-series data into train/dev/test sets for ML

Options

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.21 02:41 PM

I currently have a kdb+ database with ~1mil rows of financial tick data. What is the best way to break up this time-series financial data into train/dev/test sets for ML?

This paper suggests the use of k-fold cross-validation, which partitions the data into complimentary subsets. But it's from Spring-2014 and after reading it I'm still unclear on how to implement it in practice. Is this the best solution or is something like hold-out validation more appropriate for financial data? I found this paper as well on building a Neural Network in Kdb+ but I didn't see any practical real world examples for dividing the dataset into appropriate categories.

Thank you.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.21 04:35 PM

kx has developed embedpy. this allows q to call python, including ML libraries like Tensorflow as in example here:

if having python in q opens up some options, look here.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.22 12:03 AM

Hi,

1 mil is a big enough number (though this depends on what exactly you want to do), most benchmark datasets are usually smaller.

Otherwise you can use data augmentation, data mixing (construct examples like alpha*ex1+(1-alpha)*ex2), use a pretrained model and etc.

WBR, Andrey Kozyrev.

четверг, 22 февраля 2018 г., 2:01:33 UTC+3 пользователь marrowgari написал:

I currently have a kdb+ database with ~1mil rows of financial tick data. What is the best way to break up this time-series financial data into train/dev/test sets for ML?

This paper suggests the use of k-fold cross-validation, which partitions the data into complimentary subsets. But it's from Spring-2014 and after reading it I'm still unclear on how to implement it in practice. Is this the best solution or is something like hold-out validation more appropriate for financial data? I found this paper as well on building a Neural Network in Kdb+ but I didn't see any practical real world examples for dividing the dataset into appropriate categories.

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.22 02:17 PM

Thanks for the reply, Andrey.

Augmenting time-series data is not something I'm familiar. Do you have other examples how to do this?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.23 12:35 AM

> cat or not

a picture of a cat is a rectangle of triples

tick data is a sequence of triples, quadruples, quintuples or wider

but simpler nonetheless

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.23 01:15 AM

need to transform your time series(s) to stationary processes.

Then there are a number of way to perform cross valuation specific for time series data, one typical uses:

Regards

Xi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2018.02.23 01:45 AM

Xi

Related Content

Main Office Contacts

**EMEA**

Tel: +44 (0)28 3025 2242

**AMERICAS**

Tel: +1 (212) 447 6700

**APAC**

Tel: +61 (0)2 9236 5700

Useful Information

Resources

Popular Links

Follow Us

KX. All Rights Reserved.

KX and kdb+ are registered trademarks of KX Systems, Inc., a subsidiary of FD Technologies plc.