Jump to content

Welcome to the new Traders Laboratory! Please bear with us as we finish the migration over the next few days. If you find any issues, want to leave feedback, get in touch with us, or offer suggestions please post to the Support forum here.

  • Welcome Guests

    Welcome. You are currently viewing the forum as a guest which does not give you access to all the great features at Traders Laboratory such as interacting with members, access to all forums, downloading attachments, and eligibility to win free giveaways. Registration is fast, simple and absolutely free. Create a FREE Traders Laboratory account here.

HighStakes

Requesting Help to Get Started with Historical Studies and Modeling of Intraday Data.

Recommended Posts

Hello everyone,

 

I would like to ask for some help in getting started with my project of statistical analysis, strategy development, modeling and eventually automation.

 

I am a neophyte with regards to this type of work, but I`m not new to trading having studied the markets for close to 5 years now. I eventually decided I wanted to day trade and I`ve been day trading the ES contract for over a year now. The last months my results have started being satisfactory and I feel that I start understanding this market. I already use basic statistics in my trading, the most simple done by myself in Excel and some of it obtained from external sources. To clarify, I`m not a scalper, but attempt to trade the larger intraday swings.

 

Now that I have more time to devote outside of market hours, I want to take my trading to the next level and start working more seriously with the data on my own. I consider it a long-term project. At first, I expect my work to compliment my current strategy and remove some of the current discretion in my trading. Eventually, as I learn more, I may re-write my whole strategy and eventually I hope to be able to automate my day trading. In essence, I want to become intimately familiar with the market that I`m trading and learn it inside out and capitalize on that.

 

1) Software for analysis

 

I know a lot of people use Excel to perform impressive historical studies. Brett Steenbarger comes to mind. I already use some basic Excel myself and while it probably have a lot of potential for me right now, I feel that I will eventually run into some limitations and thus it would be better to focus on learning a more powerful platform from the start.

 

After some online research, it seems like a mathematical platform such as MATLAB or Mathematica would be my best bet. More specifically, MATLAB seems to be the way to go for a trader after what I`ve read. Price is not a major issue for me, as I can buy it for a student dicsount.

 

Are there some other options that I need to consider or is MATLAB the way to go?

 

How big is the learning curve? I have no prior programming experience. What I like about MATLAB (not sure about Mathematica) is that since it is so popular there seems to be so many great resources online where one can get help to learn the platform. I also have a few aquaintances that know it very well and that should provide useful when learning to use it on my own.

 

2) Data for analysis

 

This is probably where I`m most clueless and really could need some help.

 

I currently do not have any intraday data for the ES market, so the first step would be to obtain that.

 

Do I really need TICK data? It seems to me that if I could buy 1-minute data, that would be more than adequate for my current needs.

 

Do I buy the data from a vendor such as TICK DATA? Or is it a better choice to upgrade to a quality feed such as IQ FEED and rip 1-minute data from there? I`m thinking both practically and economically.

 

3) Organizing and working with the data

 

Assuming that I have now bought a platform and data, how do I organize, store and work with my data?

 

Do I need to set-up a database? Can this be done in Excel?

 

How should I organize and store the statistics and analysis that I produce?

 

I`m sorry if this is asking to much, but I would be greatly appreciative if someone who got knowledge on this could help me cut some corners and point me in the right direction so that I can get started.

 

If any books on the subjects are worth reading, please let me know.

 

Thanks very much in advance,

 

HighStakes

Share this post


Link to post
Share on other sites

Hi,

 

Whoa man... you seemed to have done lot of research.

 

R is preferable because it is open source, which translates to more 'open and collaborative' user groups on the internet. I talked to hedge fund programmers and there is absolutely nothing which matlab can achieve but R cannot.

 

Collaborating with other people can be very helpful. It makes the learning experience easier and enjoyably. A lot of quant students use R so it is easier to make friends who can share their libraries with you.

 

R has more resources online than matlab. Though you have acquaintances who will help you with matlab, the other option could be equally viable.

 

I'll suggest start with 1-minute data. Since you are not scalping, it should be good enough.

 

I've not tried, but a friend who recently got started in R says it is relative easy. Here are some resources:

dataset - Data APIs/feeds available as packages in R - Statistical Analysis - Stack Exchange

Modern Toolmaking: Backtesting a Simple Stock Trading Strategy: Part 3

 

Cheers,

DD

Share this post


Link to post
Share on other sites

in an attempt to help...

Software...

You seem to have this covered and if it is a long term project then learning any language will not be a problem. If going down this route then others are better qualified to answer. For me, I have always used excel for a few reasons - its flexibility, ease of use (I use a lot of macros and so am self taught vba), the massive number of users worldwide and help, cost, and the only real downside i see is its speed and for really big database files it is not ideal. Basically this all boils down to what suits you and what you like.

Data...

This is the biggest nightmare IMO. Mainly because of cost, accuracy and details both in terms of instruments and depth of what you need. Ideally tick data is the basis for everything and so the best, however it is clearly overkill if not needed.

One issue with data is the on going maintenance and getting it into formats that are usable and relevant. Plus the accuracy often leaves a lot to be desired. There are many data vendors and they vary in cost and accuracy, and there are probably only about 5 of note- the same names always come up. Most will determin if you just need a download, or live continual data going forward....for back testing, i suggest you go the simplest cheapest option, get a download and then build your system - you can add to it later.

Additionally trading futures you have the issues of continuous contracts - there are some threads about building those, here and elsewhere......a real pain (I will try and dig up a file I had on it, but I think it has been loaded here on TL)

Data Organisaton

See Q2....a lot will depend on how they give you the data and how big it is. For me I use text files for the same reasons I use Excel....but I think there are probably more efficient/faster ways of doing it. Plus they can usually interact with most other systems.

As part of building a back tester you do need to have two other things of note - how to analyse the data, and how to store and compare the results of the test. Again Excel is simple and easiest.

For all of this, work out a plan of what you want, where you want to get to and what is not relevant - it will save you a lot of time.

 

There are plenty of good ready built systems out there that will save you a lot of time and effort - multicharts, NT, Siera Chart, Tradestation etc etc; Currently I prefer Sierra chart. So again dont replicate what has already been done. (I think if you really get detailed you will need to build your own) Often these system have a lot but not quite enough....

hope this helps.

Share this post


Link to post
Share on other sites

I rip my data off an eSignal data stream. I thought a long time about creating a database for storage and analysis but I came to the conclusion that simple text files offer the most benefits: (1) the data is human readable (great for bug hunting and cross-checking), (2) offline storage and backup is a non issue, (3) the data is universally accessible and easily transformable. As for data organization, I have a folder per security with a file per contract (i.e. ZC\Z2011_D.txt is daily settlement prices for December corn, ES\H2012_1.txt is one minute data for March ES, you get the picture.) It's not very sexy but it works.

 

A word about ripping data off streams - if you want real tick data check with your provider first. IB for instance does NOT provide realtime tick data, they stream snapshots. Also, check how far back your provider's historical data goes. With eSignal some contracts go back 30+ years on EOD data but intraday prices are backward available for a few years at most. If you need more you'll need to buy your data from a specialized vendor (this can get expensive real fast, so check if you really need it.)

 

Once you got the data on your hard drive you'll need to think about scrubbing it (filtering out bad ticks / quotes.) You'll find some pretty technical stuff on the web, google it; then decide whether it's really worth the hassle. Obviously this really depends on your strategy, if you trade long-term trends this is gonna be easy (just ignore the errors), if you scalp this is going to be a major pain.

 

If you'd like to trade futures you'll probably need to construct a perpetual contract for backtesting purposes. I found Ed Seykota's article pretty useful (while you're there do check out his article on risk management, it's a good read.)

 

As for software I'd recommend MATLAB. It's simply a fantastic peace of software. It's very flexible, the online community is huge and very helpful. The learning curve is a bit steep (especially if you don't have prior programming experience) but it's doable, given the right motivation :missy:. Caveat: you'll have to program most of your indicators yourself, there aren't that many that come standard out of the box. But, strange as it might seem, I found this rather helpful -- it made me *think* about what those shiny lines on my screen actually meant instead of just trusting them because They Told Me To use them. This is me though, I don't really know R or Octave (a free MATLAB clone.)

 

Hope this helps,

A

Edited by Avarice

Share this post


Link to post
Share on other sites

Hello everyone,

 

Thank you all for your helpful replies and please accept my apologies for the late reply. :)

 

R is preferable because it is open source, which translates to more 'open and collaborative' user groups on the internet.

 

Thank you for your opinion and for those links. I`ll spend time with them over the weekend. :)

 

As far as I can tell, the online community for MATLAB is not any less than that for R. I`ve read a few debates on R vs MATLAB and the consensus seems to be that MATLAB may be slightly better for the purposes I`m interested in and also easier to learn.

 

Then, you`ll hear others who say just the opposite. :)

 

in an attempt to help...

Software...

You seem to have this covered and if it is a long term project then learning any language will not be a problem. If going down this route then others are better qualified to answer. For me, I have always used excel for a few reasons - its flexibility, ease of use (I use a lot of macros and so am self taught vba), the massive number of users worldwide and help, cost, and the only real downside i see is its speed and for really big database files it is not ideal. Basically this all boils down to what suits you and what you like.

Data...

This is the biggest nightmare IMO. Mainly because of cost, accuracy and details both in terms of instruments and depth of what you need. Ideally tick data is the basis for everything and so the best, however it is clearly overkill if not needed.

One issue with data is the on going maintenance and getting it into formats that are usable and relevant. Plus the accuracy often leaves a lot to be desired. There are many data vendors and they vary in cost and accuracy, and there are probably only about 5 of note- the same names always come up. Most will determin if you just need a download, or live continual data going forward....for back testing, i suggest you go the simplest cheapest option, get a download and then build your system - you can add to it later.

Additionally trading futures you have the issues of continuous contracts - there are some threads about building those, here and elsewhere......a real pain (I will try and dig up a file I had on it, but I think it has been loaded here on TL)

Data Organisaton

See Q2....a lot will depend on how they give you the data and how big it is. For me I use text files for the same reasons I use Excel....but I think there are probably more efficient/faster ways of doing it. Plus they can usually interact with most other systems.

As part of building a back tester you do need to have two other things of note - how to analyse the data, and how to store and compare the results of the test. Again Excel is simple and easiest.

For all of this, work out a plan of what you want, where you want to get to and what is not relevant - it will save you a lot of time.

 

There are plenty of good ready built systems out there that will save you a lot of time and effort - multicharts, NT, Siera Chart, Tradestation etc etc; Currently I prefer Sierra chart. So again dont replicate what has already been done. (I think if you really get detailed you will need to build your own) Often these system have a lot but not quite enough....

hope this helps.

 

Hello and thank you for your input. :)

 

The problem with talking to geeks is that they may not be able to relate to us mere mortals when handing out advice, referring to two of my aquaintances who both are proficient programmers. :)

 

Since I am not an Excel wizard yet and do not know any VBA, I was told by them that my time was better spent moving on to MATLAB right now.

 

I have not decided yet, but I will probably buy MATLAB (student version) and try to get my fet weet and see if I feel it is something I can learn without too much effort. I know that I can learn it if I want to, but the question is how much time and effort I will need to put into it before I`m at a level where I can use it without too much pain.

 

Regardless of whether I choose R, MATLAB, Excel, etc, the first step would be to buy data?

 

You seem to know a few things here, so I would be appreciative of any advice on where to get it. I think 1-minute data is enough for now. The plan was to sign up with IQ Feed and use Qcollector to rip data from the feed. It seems like a very easy way to do it including automatic updates and real-time integration. Also fairly cheap, compared to many of the other vendors. I read some users who experienced bad high/low readings on the 1-minute historical values from IQ Feed, so that was a little discouraging if it`s not something that`s improved upon by now.

 

You said, "get a download", do you mean purchasing quality data from a vendor as opposed to ripping of a feed?

 

One option I see could be to buy tick data and then use IQ Feed to build on that since tick data is available for 30 days. I am primarily interested in only two symbols, ES and CL, and that could be affordable to buy from a vendor. Using only IQ Feed, I have a lot more at my hands.

 

I rip my data off an eSignal data stream. I thought a long time about creating a database for storage and analysis but I came to the conclusion that simple text files offer the most benefits: (1) the data is human readable (great for bug hunting and cross-checking), (2) offline storage and backup is a non issue, (3) the data is universally accessible and easily transformable. As for data organization, I have a folder per security with a file per contract (i.e. ZC\Z2011_D.txt is daily settlement prices for December corn, ES\H2012_1.txt is one minute data for March ES, you get the picture.) It's not very sexy but it works.

 

A word about ripping data off streams - if you want real tick data check with your provider first. IB for instance does NOT provide realtime tick data, they stream snapshots. Also, check how far back your provider's historical data goes. With eSignal some contracts go back 30+ years on EOD data but intraday prices are backward available for a few years at most. If you need more you'll need to buy your data from a specialized vendor (this can get expensive real fast, so check if you really need it.)

 

Once you got the data on your hard drive you'll need to think about scrubbing it (filtering out bad ticks / quotes.) You'll find some pretty technical stuff on the web, google it; then decide whether it's really worth the hassle. Obviously this really depends on your strategy, if you trade long-term trends this is gonna be easy (just ignore the errors), if you scalp this is going to be a major pain.

 

If you'd like to trade futures you'll probably need to construct a perpetual contract for backtesting purposes. I found Ed Seykota's article pretty useful (while you're there do check out his article on risk management, it's a good read.)

 

As for software I'd recommend MATLAB. It's simply a fantastic peace of software. It's very flexible, the online community is huge and very helpful. The learning curve is a bit steep (especially if you don't have prior programming experience) but it's doable, given the right motivation :missy:. Caveat: you'll have to program most of your indicators yourself, there aren't that many that come standard out of the box. But, strange as it might seem, I found this rather helpful -- it made me *think* about what those shiny lines on my screen actually meant instead of just trusting them because They Told Me To use them. This is me though, I don't really know R or Octave (a free MATLAB clone.)

 

Hope this helps,

A

 

Hello Avarice,

 

Thank you, it does help indeed. I`ll look over those links over the weekend, thanks. :)

 

Have you needed to go through this process of scrubbing your data from eSignal? My system is short-term and I am absolutely dependent on accurate intraday data. I`m aware of the snapshots from IB, which happens to be my data feed and brokers today. That`s why I have to upgrade to a better feed, both for retrieving accurate historical data and also for getting more accurate tick/volume charts in real-time. My plan was to upgrade to IQ Feed and rip data from their database using Qcollector, but I`ve seen a few complaints about their 1-minute high/low values being out of whack. Need to do some more research there.

 

Nice to hear that you are satisfied with MATLAB. I can understand what you mean by it being helpful to write the indicators yourself, to really understand it. For my current trading, I do not use any indicators, save a simple 20-EMA that`s really not that important.

 

How long did it take you to get up to speed on MATLAB? I know it`s a fairly useless question to ask as we`re all different, but I`m asking anyway. :) I bought a MATLAB book from Amazon that looks great, but my shipment never seems to arrive. I plan on buying the software and follow along in the book, to see if it is something I feel I can learn without too much effort. I guess I need to find out for myself. :)

 

Thanks again!

 

Regards,

 

HighStakes

Share this post


Link to post
Share on other sites

 

Since I am not an Excel wizard yet and do not know any VBA, I was told by them that my time was better spent moving on to MATLAB right now.

 

probably yes....I use Excel as I dont feel its worth the time and effort for me to learn a new thing, but other systems are definitely faster. For me my testing is ideal for Excel as the speed is not such an issue. I can test 3000 bars of data comprehensively that might generate 500 trades (a lot I know but this is what takes the time) it takes 2-3 secs...so not an issue - if I record all data comprehensively while it updates rather than just in memory and arrays it might take 12 secs. You can build almost anything in it, indicators, tests, combine macros and switches....eg; when testing do you want each trade to have a separate attached stop, or a newly derived one every pass of data....Excel makes it easy for me.

. For me I like the fact that I know excel, it integrates well with other programs, and I get then do almost anything pre and post data analysis. ITs a one stop shop. So I have never really investigated other options. But if starting from scratch, other options may be the way to go.

Do what works for you as Geeks will always make fun of those using an old fashioned hammer, but if it works and gets the job done. :)

 

Please keep me informed about your progress maybe I will look to expand my horizons.

 

 

Regardless of whether I choose R, MATLAB, Excel, etc, the first step would be to buy data?

......................

 

You said, "get a download", do you mean purchasing quality data from a vendor as opposed to ripping of a feed?

 

 

I would just get some historical data from anyone that is free and build your system FIRST.

Not much point wasting money and time worrying about the data if you never build a system. Then once you have that getting the data is a whole other kettle of fish.

Getting a download is just a mass download of data, rather than waiting to try and collect and store your own over the next few months.

 

As others mention, accurate data and what you require in your data is the next step, but I would worry about that (thinking about it before hand is good) but build the system first. You can spend/waste a lot of time on data if you never build something.

Share this post


Link to post
Share on other sites
I would just get some historical data from anyone that is free and build your system FIRST.

Not much point wasting money and time worrying about the data if you never build a system. Then once you have that getting the data is a whole other kettle of fish.

Getting a download is just a mass download of data, rather than waiting to try and collect and store your own over the next few months.

 

As others mention, accurate data and what you require in your data is the next step, but I would worry about that (thinking about it before hand is good) but build the system first. You can spend/waste a lot of time on data if you never build something.

 

Well, I took the plunge and have now bought MATLAB. I was pretty much already convinced it was the way to go and after watching several of the webinars on their site, I no longer had any doubts. It integrates very well with excel, so I will probably use both. I paid $205 dollar for the student version with the toolboxes that I want to use, so it is very affordable.

 

Where do I get free historical intraday data?

 

I`m upgrading to IQ Feed anyway and it will be no problem to start off with the data from their servers.

 

For now, I have more than enough to learn the basics of MATLAB, but I don`t think it will be that long until I need decent historical intraday data for the work that I plan on doing.

 

Anyone else reading who know any good forums or resources for MATLAB beyond what is at mathworks?

 

Thanks in advance,

 

HighStakes

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.