By combining different data sources (worldwide indices, moving averages, oscillators, clustering or categorization of financial news) an investor could take better decisions on where and when to invest. After such an analysis our goal is to perfor…
By combining different data sources (worldwide indices, moving averages, oscillators, clustering or categorization of financial news) an investor could take better decisions on where and when to invest. After such an analysis our goal is to perform sufficiently better predictions than mere chance.
Some days ago i came across a website called Inner8. Inner8 is a really interesting idea : Collaborative filtering of stock picking. Combine this with analytics and an investor has on his arsenal -yet- another investing tool. Imagine thousands of Inner8 subscribers making stock predictions and giving their ideas, insights and sentiment for the stock market. After a few months some users will be “prediction super stars” from mere chance, so one has to proceed with caution. Nevertheless it is a website to keep looking at in the future, especially if the subscriber volume increases significantly.
So let us go back to our problem : We have to think of a good way to combine the information in our possession (aka problem representation) and feed this data on one or more algorithms with the goal of achieving models of high predictive value.
Some of the things to consider :
1) Should the “sliding window” technique be used? Could repetition of training data (because there is repetition of data in sliding window training) affect the predictive power of the model?
2) How many variables? Which are good predictors?
3) Do we care only about predictive power of the model? How about the interpretation of why a stock behaves as it does?
4) How can we represent the “additive effect” of 2 straight days of bad market news if a sliding window is not used?
5) Prediction Goal : Are we after price prediction (Regression) or price limits? (Classification)
Unfortunately the list does not end here : Since i am after predictions of stock prices in the Greek Stock Exchange, the data should be presented to the learning algorithm in a coherent way. European Markets are affected by the closing of US Markets and Asia. During Greek trading hours the US Markets open (approx. 45 mins before the end of trading – at 16:30 EET) , a fact that should be also taken into account.
I am sure that there are many users out there that have read a couple of data mining books, downloaded an open-source data mining tool, fed some data in and expect to see results. My only advice to them without the slightest sign of criticism: Paper-trade first…