St. Petersburg University
Graduate School of Management
Master in Management Program
IBM Watson Analytics vs. Conventional Econometrical
Software: A Comparative Analysis of Suitability for
Financial Sector
Master’s Thesis by the 2nd year student
Concentration — Information
Technologies and Innovative Management
Ilias Faizullov
Research advisor:
Associate professor, Sergey A. Yablonsky
St. Petersburg
2016
1
ЗАЯВЛЕНИЕ О САМОСТОЯТЕЛЬНОМ ХАРАКТЕРЕ ВЫПОЛНЕНИЯ
ВЫПУСКНОЙ КВАЛИФИКАЦИОННОЙ РАБОТЫ
Я , Файзуллов Ильяс Рафисович, студент второго курса магистратуры направления
«Менеджмент», заявляю, что в моей магистерской диссертации на тему «IBM Watson
Analytics и Стандартные Эконометрические Пакеты: Сравнительный анализ пригодности
для финансового сектора», представленной в службу обеспечения программ магистратуры
для последующей передачи в государственную аттестационную комиссию для публичной
защиты, не содержится элементов плагиата.
Все прямые заимствования из печатных и электронных источников, а также из
защищенных ранее выпускных квалификационных работ, кандидатских и докторских
диссертаций имеют соответствующие ссылки.
Мне известно содержание п. 9.7.1 Правил обучения по основным образовательным
программам высшего и среднего профессионального образования в СПбГУ о том, что
«ВКР выполняется индивидуально каждым студентом под руководством назначенного ему
научного руководителя», и п. 51 Устава федерального государственного бюджетного
образовательного учреждения высшего образования «Санкт-Петербургский
государственный университет» о том, что «студент подлежит отчислению из СанктПетербургского университета за представление курсовой или выпускной
квалификационной работы, выполненной другим лицом (лицами)».
_______________________________________________ (Подпись студента)
25.05.2016 (Дата)
STATEMENT ABOUT THE INDEPENDENT CHARACTER OF
THE MASTER THESIS
I, Faizullov Ilias, (second) year master student, program «Management», state that my master
thesis on the topic « IBM Watson Analytics vs. Conventional Econometrical Software: A
Comparative Analysis of Suitability for Financial Sector », which is presented to the Master
Office to be submitted to the Official Defense Committee for the public defense, does not
contain any elements of plagiarism.
All direct borrowings from printed and electronic sources, as well as from master theses,
PhD and doctorate theses which were defended earlier, have appropriate references.
I am aware that according to paragraph 9.7.1. of Guidelines for instruction in major
curriculum programs of higher and secondary professional education at St.Petersburg University
«A master thesis must be completed by each of the degree candidates individually under the
supervision of his or her advisor», and according to paragraph 51 of Charter of the Federal State
Institution of Higher Education Saint-Petersburg State University «a student can be expelled
from St.Petersburg University for submitting of the course or graduation qualification work
developed by other person (persons)».
________________________________________________(Student’s signature)
25.05.2016 (Date)
2
Table of content
Introduction..................................................................................................................................... 4
Chapter I. The state of art predictive analytics................................................................................ 6
1.1
Predictive analytics and big data.......................................................................................... 6
1.2 Predictive analytics..................................................................................................................10
1.3 Social Media and Business news Analytics.............................................................................12
1.4 Market of predictive analytics tools in financial sphere..........................................................13
1.5 Research gap............................................................................................................................14
1.6 Research methodology and organization of the study............................................................. 15
1.7 Conclusion of Chapter I...........................................................................................................16
Chapter II. Research framework....................................................................................................18
2.1 Research goals, KPIs, objectives, questions, and limitations.................................................. 18
2.2 Methods of evaluation of advanced analytical platforms........................................................ 20
2.3 Methods of comparing the forecasting accuracy of IBM Watson and statistical packages.....21
2.4 Method of currency exchange rate forecasting using Statistical Packages.............................22
2.5
Methods of stock forecasting using Statistical packages....................................................23
2.6
Conclusion of Chapter 2..................................................................................................... 25
Chapter 3. Empirical estimation of analytical platforms............................................................... 26
3.1 Evaluation of the Analytical Platforms....................................................................................26
3.1.1 Justification of the choice of analytical platforms taken for consideration..........................26
3.1.2
Results of Evaluation of Analytical Platforms................................................................28
3.2
Evaluation of the forecasting accuracy of IBM Watson Analytics..................................... 30
3.2.1 Data description.................................................................................................................... 30
3.2.2 Forecasting stock prices with theoretically based models....................................................32
3.2.2.1Results of the Random walk models for currencies........................................................... 32
3.2.2.2 Currency’s exchange rates forecasting using factor models........................................... 33
3.2.2.3 Stock forecasting using CAPM model.............................................................................. 34
3.2.3 Forecasting stock market using IBM Watson analytics........................................................35
3.2.3.1 Models for stock forecasting............................................................................................. 35
3.2.3.2 Models for currency’s exchange rate forecasting........................................................... 39
3.2.3.3 Analysis of the results of stock price forecasting.............................................................. 42
3.3 Conclusion of the Chapter 3.................................................................................................... 43
Final Conclusions.......................................................................................................................... 44
Discussion of the findings............................................................................................................. 44
Theoretical implications................................................................................................................ 45
Managerial implications................................................................................................................ 46
Limitations.....................................................................................................................................46
List of references........................................................................................................................... 47
Appendix 1. Specifications of Models.......................................................................................... 50
Appendix 2. Specification of models, suggested by Watson Analytics.........................................53
Appendix 3. Results of the IBM Watson Analytics Predict function for currencies.....................65
Appendix 4. Results of the IBM Watson Analytics Predict function for stocks............................68
3
Introduction
Market of financial analytics is a fast growing niche. It was estimated in Financial
Analytics Market forecast conducted by Research and Markets (2014), that by 2018, total market
value of financial analytics will reach the mark of 6.65 billion dollars. At the moment, many of
different players are struggling to capture a share of the market, among them are such a
renowned giants as IBM and Microsoft.
Such a rapid growth of the financial analytics market is driven by the need of financial
organizations to manage increasing amounts of structured and unstructured information coming
from different sources, as it states Srivastava (2015). In other words, emergence of big data
creates a market for advanced analytics.
One of the spheres of financial analysis, which attracts attention of both financial
organizations and individual traders, is stock price forecasting. Main characteristic of any
financial assets, which is available for all participants of the market, is its price. These prices can
be represented as prices of purchase of bonds and stocks, as currency exchange rates, or as
interest rates of a bank deposit. The whole assembly of all these values at any given moment in
time comprises the conjuncture of the market. There are three main classic methods of the stock
price’s dynamic prediction: Technical Analysis, Fundamental Analysis, and Quantitative
Analysis.
According to Schwager (1996), Technical analysis is based on the examination of
historical trends on the market, which are represented by the market statistic of stock prices and
volumes. Technical analysis operates under the assumption that all available and relevant
information, including so-called fundamental factors is reflected in the asset’s prices. In addition,
technical analyst assumes that some patterns of the stock market are repetitive and can be
revealed using indicators, oscillators, and other “technical” methods. The shortcoming of such a
methods is an absence of a systematic and scientific basing of the majority of its empirical
methods.
Another approach is the Fundamental Analysis. It is based on the evaluation of the
fundamental macroeconomic and microeconomic factors. Niemira (1998) claims that,
fundamental analysis focuses on the condition of the issuer, on its revenues, market position, etc.
Macroeconomic factors, influencing the whole industry and the country (GDP, Unemployment
rates, and so on), are also taken into consideration.
The third classic approach to the stock market analysis, as it was described by
Curthberston (1996), is Quantitative Analysis, which is based on statistical data, just like
technical analysis, but instead of indicators, it uses statistical and mathematical models and tools,
which are also referred as econometrical.
4
A new approach has emerged in the last years – the predictive analytics. It has gained
attention due to the increasing amount of the available and relevant to the market information.
Mark E. (2006) has estimated, that in 2007-2009 the humanity has generated more information
than in the previous 1000 years. This information overload caused the emergence of the term
“Big Data”, which refers to the high-volume, high-velocity, and high-variety data. Predictive
analytics is a quantitative analysis per se, but with the ability to use it on the “big data”. It uses
the same statistical and mathematical tools as quantitative analysis, however, it differs in the
research approach: while standard econometric models just test pre-generated, based on theory
hypothesis, predictive analytics is capable of finding correlations between variables in huge
datasets without preliminary hypothesis i.e. predictive analytics generates its own statistical
hypothesis based on the data.
Big Data creates challenges as well as opportunities, financial organization, such as banks
have a lot to gain from analyzing Big Data, as Tian (2015, 34) argues: “The large scale of data
contain enormously valuable information, and analytics based on big data can provide financial
organizations with more business opportunities and the possibility to gain a more holistic view of
both market and customers. Big data analytics can benefit banking and financial market firms in
many aspects, such as accurate customer analytics, risk analysis and fraud detection. These
approaches can lead to smarter and more intelligent trading, which can help organizations to
avoid latent risks and provide more personalized services, thus to get a higher degree of
competition advantage”. Challenges of analyzing vast amount of high volume, high velocity, and
high variety data, which is also presented in both unstructured and structured form, create the
need for an advanced analytical tool.
Nowadays, there are multiple analytical platforms available for banking and other
financial organizations. Such giants as IBM, Microsoft, Google, and Amazon are offering their
analytical products to the market. According to Gartner’s Magic Quadrant of Advance Analytical
Platforms (2014), the leading position on the market of analytics platforms belongs to the IBM
Corporation, RapidMiner, and SAS. Such a giant as Microsoft is lagging behind, but in a past
two years it has showed positive dynamics and now it is catching up with the leaders.
The goal of this research is to determine which of these analytical platforms fits better for
fit for the purposes of stock market forecasting. In the theoretical part, we will discuss the
influence of the big data and predictive analytics on financial organization’s operations. Then we
will define the requirements of these organizations to an analytical platform, and generate the set
of KPIs to evaluate the platforms.
Among other KPI’s we will pay attention to the ability of analytical platforms (Using
IBM Watson Analytics as an example) to generate predictive models for stock prices forecasting.
5
We will compare the results with the outcomes of some of the traditional, theoretically based
econometric models.
Chapter I. The state of art predictive analytics.
1.1 Predictive analytics and big data.
Predictive analytics is connected with the term “Big Data”, which has become popular in
the past decade as it shown by Jianzheng (2016); figure 1 illustrates the raising academic interest
to the subject.
Figure 1. Dynamics of the number of published studies on Big Data. Source: Jianzheng
(2016).
There is a confusion among executives around the world, regarding the question what Big
Data really is. As it is shown on the figure 2, according to research conducted by SAP (2012), the
majority of executives perceive big data as an increased amount of customer related information,
which requires processing (28% of respondents), and almost a quarter connects Big Data with
the technologies for processing vast amounts of information.
TechAmerica Foundation defines big data as follows: “Big data is a term that describes
large volumes of high velocity, complex and variable data that require advanced techniques and
technologies to enable the capture, storage, distribution, management, and analysis of the
information.”
6
Another definition of the big data we can find at the Gartner IT Glossary: “Big data is
high-volume, high-velocity and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
Figure 2. Definitions of big data based on an online survey of 154 executives in April
2012. Source: SAP (2012)
Both definitions describe the Big Data as a data, which possess three qualities, or, as they
are also called, three V’s: Volume, Variety, and Velocity.
Volume is a relative characteristic of the Big Data, as it tends to increase over time: what
is considered as huge volume today may not meet the requirements of being “Big” in the future,
for example ,in 2012 a dataset over a terabyte was considered as a Big Data, says Schroeck, M
(2012).
Variety of the data means structural heterogeneity of the Big Data, which consist of many
data formats. As Cukier, K (2010) claims, only around 5% of the data is structured, other 95 %
are unstructured and represented mostly by audio, video, and text formats. Unstructured dada can
not be analyzed by the machinery, therefore it poses serious challenge for an analysist.
Velocity refers to the speed at which the data is generated. The rise of the digital
technologies has led to the increase of the information generation rate, making the analysis of the
market even more complicated.
There are another V’s of the Big Data, introduced by the IBM, SAS, and Oracle: Veracity,
Variability and Complexity, and Value.
7
Veracity refers to the unreliability of the data, for example, social media sources are
unreliable by nature, as they are generated by the broad masses of people.
Variability and Complexity refers to the unsteady rate of information generation and the
diversity of the sources it comes from. Analyzing multiple information flows, which are coming
at the different rates and have their own cycles, downs, and peaks, drives the need for the
advanced analytics tools.
Finally, the last V – Value. The low share of the valuable characterizes big data;
nevertheless, the overall value of the whole dataset is high, as the volume is immerse, which also
supports the need for an appropriate analytical tool.
All this V’s are not constant, they varies over time and an industry, they are also
interdependent, if the one changes, others will be influenced as well.
It will be a mistake to pay more attention to the first V – volume. Other V’s are no less
important. As Jagadish (2015, 50) claims, the main reason volume of the data gets more attention
is that it is easily measurable, unlike variety and velocity: “I have discussed above, why Volume
(or size) gets undue attention. Let me turn now to why I think Variety and Veracity do not get the
attention they deserve. One major reason for this lack of attention is that there is no wellaccepted measure for either. If there is no measure, it is hard to track progress. If I have a
company and develop an innovative system that can handle a slightly larger volume than the
competition, I can show this off with measurements against some benchmark. If I am an
academic and develop an algorithm that scales better than the competition, I know exactly how
to compare my algorithm against the competition and persuade skeptical reviewers. In contrast,
consider variety. If I have a product that makes handling variety a little easier, what technical
claim can I make that doesn’t sound like marketing hype? If I write a paper about a data model
that is better at handling variety than the current state of the art, I have to think very hard about
how I will compare against the competition and establish the goodness of my idea. Progress is
hard in things you cannot measure, in both industry and academia. Variety may be the hardest of
the 4Vs to address, but it is the one that people are least motivated to speak about.”
There are different techniques for different types of big data being analyzed (structured or
unstructured). Types of Big Data analysis methods are as follows: Text analytics, Audio
analytics, Video analysis, Social Media analytics, and Predictive analytics.
Audio analytics mostly consist of speech analysis, which is aimed at tracking customer’s
feedback, as Gandomi (2015, 141) claims it: “Call centers use audio analytics for efficient
analysis of thousands or even millions of hours of recorded calls. These techniques help improve
customer experience, evaluate agent performance, enhance sales turnover rates, monitor
compliance with different policies (e.g., privacy and security policies), gain insight into customer
8
behavior, and identify product or service issues, among many other tasks. Audio analytics
systems can be designed to analyze a live call, formulate cross/up-selling recommendation based
on the customer’s past and present interactions, and provide feedback to agents in real time.”
Video analytics is the least developed brunch, but it bears potential for customer’s
behavior analysis, as Gandomi (2015, 142) states: “…potential application of video analytics in
retail lies in the study of buying behavior of groups. Among family members who shop together,
only one interacts with the store at the cash register, causing the traditional systems to miss data
on buying patterns of other members. Video analytics can help retailers address this missed
opportunity by providing information about the size of the group, the group’s demographics, and
the individual members’ buying behavior.”
Text, Social Media, and Predictive analytics are relevant for stock market forecasting, so
we will shortly discuss them.
Text analytics deals with all kind of written sources such as news, blogs, emails,
documents and so on. Text analytics derives the main ideas out of huge amounts of textual data
by creating summaries. Chung (2014) supports the idea that this technique can be used for stock
market forecasting, as it can forecast price movements based on financial expert’s sentiments.
According to Gandomi A. (2015), Text Analytics techniques include:
1. Information extraction – converting unstructured textual data to constructed one.
2. Text summarization – a technique, which generates meaningful summaries out
from texts, using Natural Language Processing methods.
3. Question answering – another technique, using Natural Language Processing
Methods. It provides answers to questions, formulated in a natural language, by
going through three steps: question processing, text processing, and answer
processing.
4. Sentiment analytics – a method, aimed at deriving aggregated customer or
expert’s opinion regarding some product or events. It operates by classifying
opinions as either negative or positive; then, based on the score of these both
classes the overall sentiment is determined.
Social media analytics is used primarily for marketing purposes such as customer’s
satisfaction analysis, community detection, an etc., as social networks provide great
opportunities for the target audience analysis. However, it also could be used for stock market
forecasting, for example, Antweiler W. (2004) has conducted a study that showed that Yahoo
finance message board could be used for stock prices prediction.
Finally, predictive analytics, which includes a variety of quantitative methods that can be
used for prediction of almost everything, from the crime rates to the stock market volatility.
9
Predictive analytics techniques classify in two categories: auto regression and regression
analysis. The first type discovers patterns within the chosen variable history; second one
exploring dependencies between different variables.
Increased academic attention to the “Big Data” can be explained by the advancement of
computation technologies. Modern data mining tools made it possible for researchers to work
with huge amounts of structured and unstructured data. Christine E. Earley (2015, 494) supports
this statement: “The availability of large amounts of computerized data in companies has been
steadily increasing over the years, but recent advances in processing speed, cloud storage, and
the rise of social networks has changed the ease of access to data and the nature of data that can
be captured and stored for later use. At the same time, software used to analyze large volumes of
data (i.e., data mining tools) as well as more sophisticated data visualization tools can potentially
increase the ability of individuals to understand the story that the data is telling them”.
1.2 Predictive analytics.
Matlis J. (2006, 42) gives the definition of the predictive analytics as follows:” Predictive
analytics is the branch of data mining concerned with forecasting probabilities. The technique
uses variables that can be measured to predict the future behavior of a person or other entity.
Multiple predictors are combined into a predictive model. In predictive modeling, data is
collected to create a statistical model, which is tweaked as additional data becomes available.”
As it is evident from the definition, predictive analytics uses the same statistical methods
as quantitative analysis, but the difference between them is in the sequence of the research steps.
Joe F. (2007) describes processes of quantitative analysis and predictive analytics as follows:
Quantitative analysis steps:
1. Theory
2. Hypotheses Development
3. Test
Predictive analytics steps:
1.
2.
3.
4.
Data
Relationships Development
Hypotheses
Model Building Test Hypotheses
5. Model Validation
As we can see, Predictive analytics offers more possibilities for analysis, as it can find
interdependencies that otherwise could have been overlooked.
The difference between predictive analytics and quantitative analysis can be represented
from explanatory vs. predictive modeling perspective.
10
Explanatory statistical models test predefined hypothesis based on theory. The role of
explanatory statistic is to show the casual dependencies between variables. In order to build an
explanatory model one should firstly identify the cause and effect relationships between
variables, and then build model for testing of his/her hypothesis. In other words, explanatory
statistic is used for proving that revealed connections between factors and depending variable are
relevant. For an evaluation of such models, analysts use statistical tests, such as, R-squared etc.,
which measure explanatory power of a model.
Predictive models have different constructing mechanism: instead of focusing on theory
based casual links between variables, predictive models are based on association links between
variables. Predictive analysis, unlike explanatory, starts with the data. Then it looks for
associations between variables within the dataset and build forecasts based on the findings.
Evaluation of predictive models is based on measuring predictive accuracy, instead of
explanatory power.
Shmueli (2010) points out four criteria which differs predictive and explanatory analytics:
“… causation-association, theory-data, retrospective-prospective, and bias-variance”. Biasvariance perspective refers to the different evaluation criteria for predictive and explanatory
models: first seeks to minimize sample variance, whereas the latter minimize model’s bias.
Both approaches (explanation and prediction) are hardly compatible within a single
model, as best explanatory model is not the best predictive one, argues Konishi S. (2007), despite
the fact that it has some level of predictive power.
Predictive models increase their accuracy at the cost of higher bias, therefore, prediction
models are not necessarily are “true”, in a sense that there may not be theoretical foundation for
them. Since predictive analytics operates on the big data, it inevitably face challenges, which
Fan, J. (2014) has identified as follows:
1. Heterogeneity. Data obtained from the multiple sources and in different formats
creates additional difficulties for an analyst.
2. Noise accumulation. Predictive models are build using multiple factors at the
same time, and total accumulated mistakes create “noise”, which can conceal true
influence of some factors.
3. Spurious correlation. Due to huge sizes of the datasets and multiple variables
being analyzed, a false correlation may be detected.
4. Incidental endogeneity. It is a threat of breaking one of the traditional assumptions
of the regression analysis – exogeneity, meaning that some of the predictive
factors could be dependent on the residual term.
11
Application areas of predictive analytics vary from business related topics, such as retail,
marketing and finance, to healthcare and environmental issues. Retailers use predictive analytics
to forecast demand on particular product. Marketers use analytics create customers profiles, to
determine the reaction of the public on new products, and to detect customer’s communities.
Law enforcement agencies use it to predict the occurrence of crimes, healthcare systems employs
predictive analytics to make diagnosis that is more precise, costume agencies use it for purposes
of fraud detection.
There are numerous possible applications for data analytics. Banks and other financial
organizations also have much to gain from predictive analytics. Today, big data challenges both
firms and individual traders, and those who are capable of rapidly extract relevant information
and analyze it, will gain the competitive edge. As the report from SAP (2012) states: “…the
profitability keeps falling in recent years, and organizations are now evolving towards smart
trading based on big data analytics. Besides designing more complex computing model and
system, how to make such large scale computation real time is still a very important problem that
is needed to be considered seriously”.
1.3 Social Media and Business news Analytics
There is a subset of Big Data, which refers to the big data derived from social media –
social big data.
Bello, O. (2016, 47) defines social big data as follows: “Those processes and methods
that are designed to provide sensitive and relevant knowledge to any user or company from
social media data sources when data sources can be characterized by their different formats and
contents, their very large size, and the online or streamed generation of information.”
Methods of processing social big data constitute social big data analytics, which is
defined by Bello, O. (2016, 47) as follows: “Social big data analytic can be seen as the set of
algorithms and methods used to extract relevant knowledge from social media data sources that
could provide heterogeneous contents, with very large size, and constantly changing (stream or
online data). This is inherently interdisciplinary and spans areas such as data mining, machine
learning, statistics, graph mining, information retrieval, and natural language among others. This
section provides a description of the basic methods and algorithms related to network analytics,
community detection, text analysis, information diffusion, and information fusion, which are the
areas currently used to analyze and process information from social-based sources.”
Social big data may be of use not only for those companies, who trade in consumer good,
but also for financial and for banking sector.
12
Asset’s prices are determined not by impartial machines, but by individuals who trade on
the stock exchange. As any human being, they are not completely rational, their decisions are
influenced by public’s mood and rumors.
Advancement of analytical applications has made it possible for researches to include
psychological factors in their predictive model. Tracing these factors is challenging, since they
are hidden in the huge amount of unstructured data. One of these factors are customer’s
sentiments and opinions about a company or a product.
People’s expectations and opinions about a particular company or product are reflected in
social media platforms, such as Twitter and Facebook. Models for stock prices prediction based
on an analysis of public’s mood were build and tested in some academic articles, such as Bollen,
Mao, and Zeng (2011), and Wu He (2015).
Johan, B. (2011) has shown that even Dow Jones Industry Average index could be
predicted by analyzing Twitter mood. First step of constricting predictive model based on
information derived from social media is public sentiment’s extraction. There are various
software tools for that purpose, including IBM Watson. Second step is data processing. It is
done by assigning scores or dimensions to every observation. Scores could be “positive”,
“neutral”, “negative”, or some other forms. After transforming initial unstructured data into
structured scores, usual statistical methods could be applied. Using the same technique, one can
build a prediction model based on the machine processing of the vast amount of business news
articles. Such a model was build by Chowdhury (2014). Accuracy of forecasting with public’s
sentiment models is varying from 70 to 80%.
1.4 Market of predictive analytics tools in financial sphere
Market of financial analytics is a fast growing niche. It was estimated in Financial
Analytics Market forecast conducted by Research and Markets (2014), that by 2018, total market
value of financial analytics will reach the mark of 6.65 billion dollars. Such a rapid growth is
driven by the impact of the big data on the operations of banks, audit firms and other types of
financial organizations.
Nowadays, researches point out the importance of predictive analytics for all
organizations, for example Ventana Research (2016, 3) states: “ Organizations increasingly need
to understand what’s happening right now and to be able to forecast what is likely to happen in
both the near future and the long term.” As a mean to serve this need, Ventana Research (2016)
sees predictive analytics. Currently, there are multiple providers of analytical tools on the market.
Among them are such giants as IBM, Microsoft, Google, and Amazon.
13
According to Doug, H (2015), the leading position on the market of analytics platforms
belongs to the IBM Corporation, KNIME, RapidMiner, and SAS. Such a giant as Microsoft is
lagging behind, but in a past two years it has showed positive dynamics and now it is catching up
with the leaders.
One of the IBM’s products became particularly popular among researchers and dada
scientists – Statistic Package for Social Sciences (SPSS). SPSS embeds vast arrange of statistical
tools and provides customers with the ability to apply econometric modeling to their data. SPSS
offers everything the analyst needs, but the main drawback of SPSS is the requirements to the
user: only qualified specialist with expertise in statistic and econometric could use SPSS
properly.
Apart from SPSS, IBM offers another service, available through cloud – IBM Watson
analytics. Watson analytics provides customers with natural predictive and visual analytics. It
includes data storage, data processing, data analysis, and visualization. In addition, it can run
social media analysis (twitter), helping to assess public’s sentiments towards any given
event/company/product.
Three key properties of IBM Watson analytics are as follows:
1. Complex arrange of services: unlike other analytical tools, that are supposed to
solve particular types of business tasks, Watson analytics helps to refine data,
evaluate its quality, analyze it, and create a report, thus rendering use of other
tools unnecessary.
2. Predictive analytics: IBM Watson automatically determines the most relevant
data, and reveals interconnections between variables.
3. Usage of natural language: IBM Watson allows users to ask questions in common
English, thus making it possible for a person without knowledge of statistic
science to operate with the data.
Microsoft’s, Amazon’s, Google’s, and SAS’s predictive analytics represented by Azure
Machine Learning, AWS Machine Learning, Google Predictive API, and SAS Visual Analytics
respectively.
In the essence, they are analogs of IBM Watson, all of them provide visualization,
analytical, and predictive services, accessible through cloud. Important feature of all these three
products is that they offer predefined analytical models for particular business need: banking,
insurance, retail etc. Unlike IBM Watson Analytics, they provide customers with the ability to
develop their own applications for very specific purposes. There are many other players at the
advanced analytics market: Prognoz, Sap, Oracle and so on, but they occupy niche market.
14
1.5 Research gap.
Influence of the big data and applications of predictive analytics in different spheres of
business, healthcare and public safety have gained some attention in the past few years.
However, the most attention gained marketing: analyzing customer’s feedback, detecting
communities via social media, demographical profiling of customers.
Big data and predictive analytics’ influence on financial and banking sectors has been
noticed in academic circles. There are some academic papers, like Earley, E. (2015), Yoon, H.
(2015), and Min, C. (2015) which addressing opportunities and challenges of big data analytics
for auditing. Other studies, like Srivastava, U. (2015) analyze the application of big data
analytics for banking sector, but they mostly cover customer profiling, risk management, and
fraud detection issues. Smith (2015) and Bologa (2010) have discussed the influence of big data
and big data analytics on the insurance sector.
Kwan, M. (2014), and Ruta, D. (2014) brought the problem of applicability of big data
analytics and predictive analytics for the purposes of increasing effectiveness of trading
operations on the stock market to the attention of academics.
However, their research only stated the opportunities and challenges of big data in
trading. They did not run empirical check and did not compare analytical platforms, available on
the market. Both information deficit and the abundance of information can make it hard for the
trader to make a decision regarding his trading strategy. Profit of an individual trader, bank, or
broker firm depends on how quickly and effective relevant information is extracted from the high
volume datasets of unstructured and structured data. Rise of the big data creates a need to an
effective and reliable methods and tools for processing vast amounts of market data.
All of the before mentioned authors have identified possible implications of big data
analytics for banking, audit, and insurance, but there is still a place for a research, which goal
would be to find out how particular type of financial organization (bank, audit, insurance or
trader), could achieve their business objectives using particular types of advanced analytical
platforms.
The goal of this research is to fill the research gap by assessing possible applications of
predictive analytics for stock market forecasting.
1.6 Research methodology and organization of the study.
In a course of this research, we will use quantitative methods to analyze and compare
forecasting abilities of the leaders of the market of advanced analytical platforms: IBM Watson
Analytics, SAS Analytics, KNIME, and RapidMiner. The comparative analysis will be based on
the set of predefined KPI’s.
15
Using the KPI’s, we will assess the ability of these analytical platforms to execute
business tasks of financial organizations. Based on the analysis, we will run a comparative
analysis of the platforms and generate recommendations regarding which platform to use for the
purposes of stock market forecasting.
Special attention will be paid to one of the KPIs – forecasting accuracy. For the purposes
of comparing chosen analytical platforms by this KPI, we will build predictive models in IBM
Watson Analytics and Gretl statistical package.
Firstly, we will build econometrical models for prediction of currencies exchange rates.
For that purpose, we will build two types of models: ARIMA and factor regression, which use
prices of the main export product of the country.
Next financial asset’s price we will try to predict is blue chips of stock markets – IBM,
Microsoft, P&G etc. As a theoretical base, we will use Capital Asset Pricing Model – CAPM.
United States financial market is one of the most developed ones, therefore it’s reality is as close
to the Effective Market Hypothesis (EMH) as it gets on real life markets.
The last financial assets we will take into consideration are stock indexes. The
importance of considering stock indexes is driven by the fact that they serve as a guideline for
traders, analyst and investors, because they reflect overall situation on the market.
It is the first phase of the empirical research, and it will be conducted using Gretl
statistical package. Our next step will be the construction of predictive models for the same
assets using the same datasets in all aforementioned analytical platforms. Apart from building
alternative quantitative models using financial data, we will make use of social big data, by
running twitter analysis with the help of IBM Watson analytics.
Accuracy of forecasts will be assessed through two characteristics: Mean Absolute
Percentage Errors and the potential profitability of applying such models. Potential profitability
will be estimated during the simulation experiments. We will imitate real life trading using given
models. We will set an investor’s behavior as follows: investor is profiting from the difference
between prices of the same assets in two consequent time periods. If the model predicts that the
price will go down, the investor buy the asset, with an intention to sell in the next period
regardless of its actual price. If the model predicts depreciation of the asset, than it goes vice
versa.
The result of the research will be a comparative analysis of forecasting abilities of some
the main analytical platforms available on the market.
16
1.7 Conclusion of Chapter I.
The rise of the big data in the recent years has created challenges and opportunities for
every type of business. In order to tackle this challenges and not to miss the opportunities it’s
necessary to use predictive analytics techniques. Big data consist of vast amount of structured
and unstructured information. It is characterized by the three V’s – volume, variety, and velocity.
For different kinds of big data being analyzed there is different type of data analytics techniques.
Data analytics consists of text analytics, audio analytics, social media analytics, and predictive
analytics.
Big data affects many spheres of business, including trading, as it allows processing and
analyzing of immense amounts of data, thus making it possible for analyst to uncover
interdependencies and patterns, which otherwise would have been ignored. Big data holds
potential to increase effectiveness of trading deals on the stock market, therefore it is subject of
interest for both individual traders and broker firms. Trader’s interest in the analytical platform is
its capabilities to explore the data, to find out interrelationships and correlations between
variables.
These interdependencies and correlations within a dataset could be detected using
traditional statistical methods. However, predictive analytics and conventional statistical methods
are not completely similar, despite the fact, that predictive analytics and econometrics use the
same mathematical and statistical toolkit. There is one fundamental difference between them: in
order to build econometrical models, one should find theoretical grounds for it, formulate
statistically verifiable hypothesis, and test it. This approach leads to a creation of an explanatory
models, which describe factors that drives observable variable, however, this kind of models
don’t have the best predictive accuracy. Predictive analytics, just like econometric modeling,
uses statistic methods, but it differs in the research approach. Predictive analytics doesn’t need to
test predefined hypothesis, instead of doing so, it explores interdependencies between observable
variable and whole set of possible predictive factors. As a result, a predictive model is created,
which however, may lack theoretical explanation and which could be more biased than
explanatory one. Additionally, advancement of cloud computing made it possible to run social
media and investor’s sentiment analysis.
Finally, such characteristics of an analytical platform as text analysis and social media
analysis is an object of interest for every financial organization (except for audit firms, since the
applicability of social media to the audit isn’t confirmed), as the majority of information comes
in an unstructured form.
17
There are many analytical platforms available on the market, we will take into account
only top five of them, according to Gartner’s Magic Quadrant of advanced analytical platforms
(2016). Most of them are available only through cloud (IBM Watson, Azure Machine Learning,
SAS Visual analytics, Amazon Machine learning), however some platforms offers their services
offline: KNIME, RapidMiner, which were recognized by Gartner as one of the market leaders.
Market of advanced analytical platforms is one of the most dynamic. Comparison of
Gartner’s Magic Quadrants from 2014 and 2015 reveals serious movements on the market.
However, one player on the market attracts special attention – IBM, with its cloud-based
analytical service called Watson Analytics. IBM has been the leader of the market for several
years, and its service provides an easy way for a researcher to analyze and visualize huge
amounts of data.
Ability to simultaneously process big datasets holds the potential for stock market
analysis. Nowadays, there are too many information on the market, coming from multiple
sources, its impossible to assess all relevant information in a short time, and time is of the
essence when it comes to forecasting a stock market.
Chapter II. Research framework.
2.1 Research goals, KPIs, objectives, questions, and limitations.
Purpose of this work is to provide potentially interested parties (trading firms), with the
comparative analysis of predictive analytics tools and providers, in order to help them to make a
decision regarding which product to use for a particular task.
In a course of this research, we will analyze and compare main advanced analytical
platforms that are available on the market. Each advanced analytical platform has its own
characteristics that are identified by the Ventana Research (2016) as follows:
1. User roles and self-service: this characteristic reflects the ability of a platform to
be used by different kind of users, with different data analysis capabilities and
different requirements to analytics.
2. Information Optimization: it reflects the ability of an analytical platform to
manage different kinds of data flows that are coming from different sources, and
the ability to refine the data.
3. Range of analytical capabilities: it includes visualization capabilities, data
exploration capabilities (uncovering of hidden patterns), and ability to detect
particular events in the dataset.
4. Cloud and Mobile deployment.
18
5. Time to Value: the ability of a platform to perform the analysis and present the
result in the shortest time possible.
From these five KPI’s we will take two: User roles, and self-services, and Range of
analytical Capabilities. We will break down them into sub criteria as follows: Visualization,
Simplicity of Use, Predictive Analytics capabilities, Range of Econometric Modeling, Textual
Analytics capabilities, and Social Analytics Capabilities.
After evaluating analytical platforms using these KPI, we will analyze how well each of
them addresses the needs of particular kind of financial organization. Then we will provide
interested parties with the recommendations regarding which advanced analytical platform to use
for each of the business objectives.
In addition, we will look into how the ability of an advanced analytical platform (using
IBM Watson Analytics) to suggest predictive models compares with standard theoretical
approaches to stock price forecasting.
Goal of this research is twofold. Firstly, it is to run a comparative analysis of main
advanced analytical platforms. Secondly, it is to assess the ability of IBM Watson Analytics to
suggest effective predictive models for stock price forecasting.
Research questions of this work are as follows:
1. Which analytical platforms is a better fit for the purposes of stock price
forecasting?
2. Does analytical platform (Using IBM Watson Analytics as example) suggest
effective predictive models for stock forecasting, in comparison with standard
theoretically based econometric models?
Research objectives:
1. To evaluate analytical platforms (IBW Watson Analytics, SAS Analytics, KNIME,
and RapidMiner), using KPIs mentioned before.
2. To make a comparative analysis of the analytical platforms.
3. To rank them based on their ability to make predictive models for stock market
forecasting.
4. To construct and evaluate theoretically based econometric models for stock prices
forecasting.
5. To construct econometric models for stock price forecasting using factors,
suggested by IBM Watson Analytics Prediction function.
6. To compare the performance of theoretically based, and Watson Analytics
suggested models.
19
Analytical platforms that will be taken into consideration: (IBW Watson Analytics, SAS
Analytics, KNIME, and RapidMiner). This choice is justified by the Gartner’s Magic Quadrant
of advanced analytical platforms (2016), which has identified them as the market leaders.
Limitations:
1. Not all advanced analytical platforms available on the market are considered.
2. Ability of IBM Watson Analytics to suggest predictive models will be compared
only with mostly common used econometric models: comparative analysis with
all econometric possible econometric models is impossible, as there are too many
of them, and new ones could always be generated.
3. Not all analytical capabilities of analytical platforms will be empirically tested.
4. Simulation of the potential profitability is made under the assumption that an
investor have access to all necessary information and reacts on it instantly.
2.2 Methods of evaluation of advanced analytical platforms.
Evaluation of the analytical platforms will be done using Analytical Hierarchy process.
According to Abdullah (2013), AHP is conducted through seven steps:
1. Determination of the hierarchy of criteria and calculation of the normalized
matrix.
2. Determination of criteria weights
3. Determination of the eigenvector.
4. Check of the consistency ration.
5. Comparison of the alternatives.
6. Calculation of the alternative’s scores
7. Ranking of alternatives.
Hierarchy of criteria is determined by the relative importance of them for the goal (car
purchasing, vendor choice etc.). In a result of a pairwise comparison of the criteria, a matrix n x
n is created. Its elements reflect the relative value of different criteria to each other. For example,
element aij indicates value of “i” criteria to “j” criteria.
Next, a normalized matrix is defined. Element aii
aii =1, and a ji =1/ aij .
in this matrix is determined as the
results of dividing the values, derived in the result of pairwise comparisons of row “i” relative to
column “i”, by the sum of the pairwise comparisons in the “i” column.
The criteria weight is determined as a mean of elements of normalized matrix:
n
n
m
∑ μi =∑ ∑ aij (1)
i=1
i=1 j=1
Eigenvector is determined as follows:
20
√ μi
n
w i=
n
∑ μi
(2)
i=1
Consistency ratio is calculated as:
CR=
CI=
CI
(3)
RI
γ max −n
(4)
n−1
n
γ max =∑
i
A wi
(5)
n wi
RI is a random index, which takes values depending on the number of elements (n). CR
should me no more than 0.1.
Scores of the alternatives are calculated using this equation:
n
Ascore=max ∑ aij w i (6)
j=1
Based on this scores, final ranking is constructed.
2.3 Methods of comparing the forecasting accuracy of IBM Watson and statistical
packages.
Method of research: comparative analysis based on the results of generated by
quantitative methods. For the purpose of comparing forecast accuracy of different tools, we will
consequently create predictive models in statistical package and IBM Watson using the same
dataset.
In order to have the most tried and reliable econometric models for comparison, we will
run forecasts of currencies exchange rates, forecasts of stock’s price dynamic, and forecasts of
stock indexes.
Econometric model building follows three steps:
1. Theory
2. Hypotheses Development
3. Test
We will forecast currencies exchange rates using two approaches: ARIMA models, and
linear regression models, which uses prices of the most exported commodities as an independent
variable. Theoretical foundation of these models could be found in Meese, R., Rogoff (1983).
And Rogoff, Rossi (2015).
Additionally, we will build CAPM models for stocks of the biggest corporations, such as
Google, Microsoft etc. We will use only USA stock market for building CAPM models, because
21
CAPM operates under the assumptions of EMH (Effective Market Hypothesis), therefore,
CAPM doesn’t fit for developing stock markets.
Forecasting with the IBM Watson Analytics differs from building econometrical models,
the main difference is that it doesn’t require strong theoretical grounds in order to make a model
– it analyzes the whole dataset and automatically suggests.
Predictive analytics steps:
1.
2.
3.
4.
Data
Relationships Development
Hypotheses
Model Building Test Hypotheses
5. Model Validation
As we can see, IBM Watson lacks theoretical grounds for model building, but best
predictive models are not necessarily the best theoretically based as it is stated by Shmueli, G
(2010).
Comparison of the predictive models will be based on the two indicators:
1. Mean Absolute Percentage errors
2. Potential profitability
Potential profitability will be estimated as profit, generated by the given model during the
simulation.
Simulation will be run in accordance with rules as follows:
1. If model predicts, that price of the asset will rise in the next period, an investor
makes a decision to buy the asset.
2. If model predicts, that price of the asset will fall in the next period, an investor
makes a decision to sell the asset.
3. If an investor bought the asset, he would sell it in the next period regardless of of
its new price.
4. If an investor sold the asset, he would buy it back in the next period, regardless of
its new price.
At the end of the prechosen period, investors stops and calculates his/her returns, which
will be used as an indicator of forecasting accuracy of the model. In order to have more reliable
indicator of the forecasting accuracy, we will run a model, simulating real life trading.
Rules of the model are simple, if it anticipates, that asset’s price will increase in the next
period, than an investor takes the decision to buy the asset, with the intention to sell it
afterwards. Depending of the actual change of the prices, such operations could bring profits or
loses.
22
2.4 Method of currency exchange rate forecasting using Statistical Packages
Since the publication of a highly cited article of Meese, R. (1983), it has become a sort of
benchmark to compare all currency exchange rates models with the Random Walk models,
which performs no worse than any other model.
However, some more recent researches like Moosa, I. (2014), argue that unbeatable
random walk is, in fact, an illusion. They argue, that random walk model seems superior only if
it is evaluated it in terms of mean square error, absolute square error and rood mean square error,
but if model is evaluated by its direction forecasting power and profitability, than random walk
lose it to almost all other models.
Random Walk is a type non-stationary time series, which is defined as follows:
(7)
X t = X t −1 +e t
Where
Xt
is an observable variable, and e t
is a pure random component.
The difference between the random walk and auto regression AR (1) is that an effect of
every random component is preserved forever.
If the process begins with t=0, than:
X t =X 0 +e 1 +… et
In a more general case, there is a constant
(8)
B1 , which turns the process into a random
walk with a trend:
X t = X 0 +B1 t+e1 +… e t (9)
Another nonstationary process is a time series with the determined trend:
X t =B 1 + B2 t +e 1 (10)
The main difference between this model and the random walk is that time series with
determined trend has a tendency to return to the trend’s line, while random walk with trend
doesn’t necessarily returns to the trend’s line.
One more approach to currency exchange rate forecasting is a regression model, based on
the prices of main export commodities of the given country. Such a model was tested by Ferraro,
Dominico, F. (2015). They tried to forecast US. dollar - Canadian dollar, US. dollar –Australian
dollar, US. dollar –Norwegian krone, US. dollar – South African rand, and US. dollar – Chilean
peso, currency exchange rates based on the prices of oil, gold, and copper. The results revealed
short-term relationships between prices of country’s main commodity price and its currency
nominal exchange rate.
However, the applicability of such a model is limited by countries, which have small
number of main export commodities, meaning that most of developed countries currencies
exchange rates couldn’t be predicted using this model.
23
As an approach to currency exchange rate forecasting, we will use ARMA and ARIMA
auto regression models.
ARMA model is defined as follows:
X t =B 1 + B2 X t −1 +…+B p+1 X t −p +e t +a2 e t −1+…+aq+1 e t−q (11)
Where
X t – observable variable,
Bt
– coefficient, which determines the influence
of the previous observations on the current one, at −¿ a coefficient, which determines
2.5 Methods of stock forecasting using Statistical packages
Econometrical methods of financial market analysis have strong mathematical and
statistical grounding. However, their applicability is limited due to assumptions, upon which
econometrical models are based.
Most of theoretical models of stock market forecasting require so-called Efficient Market
Hypothesis.
Efficient market hypothesis refers mostly to the information effectiveness of a market.
Efficient market hypothesis implies that information is equally available to all participants of the
market; they interpret it in a similar manner and instantly use it to adjust their strategy and
operations.
In addition, efficient market theory suggest that all players are rational, have similar goals
and use similar strategies.
Main characteristic of an efficient market is a result of the realization of all
aforementioned assumptions. If a market is efficient, then prices of assets instantly, completely
and correctly assimilate all available and relevant information, and reach equilibrium, thus
making regular gain of abnormal incomes impossible.
In the efficient market, it is considered that expected returns includes all systematic risks,
and provide investors with acceptable returns, consistent with all other similar risk level assets.
One of basis models, based on efficient market theory is Capital Assets Pricing Model
(CAPM). Its main equation looks as follows:
μi= R0 + βi ( μ M −R 0 )
Where
μi−¿ expected return of any given asset;
(11)
R0
– risk free return,
coefficient, reflecting the nature of the asset (riskier and more profitable assets have
and less risky and less profitable ones have
β i < 1); μ M
βi
– beta
βi
> 1;
– average market return on assets.
CAPM is based on the list of assumptions:
1. Investors evaluate assets using their expected returns and risks
2. Expected returns are stochastic
3. Risk is measured as dispersion of returns
24
4. Investors are trying to maximize their asset’s returns
5. Investors are risk aversive
6. Absence of a monopolistic influence on the market
7. Absence of taxes
8. Absence of transactional costs
9. Absence of unexpected inflation
10. Assets are infinitely divisible
11. No limitations on leasing and lending on risk free rate
12. All investors have similar planning horizon
13. All investors evaluate probability distribution of expected returns
14. Information is free and all investors have equal access to it.
It is evident from the list that CAPM assumptions are unrealistic, as nearly the half of
them contradict the reality of actual financial market. However, this model serve as a base from
which other, more realistic models could be derived. It is done by loosening some of the
aforementioned assumptions, thus making model more applicable for actual forecasting.
Another class of econometric models is factor models. These models assume that
expected return of an asset could be determined as a reaction to a change of some economic
factors, such as GDP, inflation or oil prices.
Factor model tries to consider main economic factors, influencing prices of assets. It
implies that any two given stocks are correlated with each other only through common economic
factors. Every factor, influencing expected return of a given asset, which is not in the model,
considered unique; therefore it doesn’t correlate with unique factors of other assets.
2.6
Conclusion of Chapter 2.
In a course of this research, we will evaluate the abilities of top analytical platforms (IBM
Watson Analytics, SAS Analytics, KNIME, and RapidMiner), to serve the needs of banks, audit
firms, insurance companies, and trades by assessing these platforms using the set of two main
KPIs: User-Friendliness, and Range of Analytical capabilities. These main KPI’s are subdivided
into six criteria: Visualization, Simplicity of Use, Predictive Analytics, Econometric Modeling,
Textual Analytics, and Social Media Analytics.
Then we will evaluate chosen analytical platforms using Analytical Hierarchy process
and the set of KPI’s mentioned above. AHP will performed through a series of pairwise
comparisons, which will determine the relatives weights of criteria and ranking of the
alternatives (IBM Watson Analytics, SAS Analytics, KNIME, and RapidMiner). After
25
conducting AHP, we will determine the most appropriate analytical platform for the purposes of
stock market forecasting.
In case of statistical packages, we will used theoretically based econometric models, and
in case of IBM Watson analytics we will let the platform to suggest optimal models by itself.
This approach a potential problem: lack of the theoretical groundings. For a trader, it may appear
be irrelevant, since he/she mostly cares about the accuracy of forecasts, however, without
theoretical basis it is impossible to guarantee the stability of the model: it could have just
happened that the factors, which affected the predicted variables, are spuriously correlated.
In the research, we will build series of models. The first one will be standard random
walk models for currency’s exchange rates. It will used for a comparison with other models,
since they will make any sense only in case if they outperform the random wall.
Another series of predictive models for currency’s exchange rates will be constructed
using simple one-factor model that use price of the most exported commodity as a predictor. The
dynamic of stock market will be analyzed by applying Capital Asset Pricing Model to the blue
chips of the United States stock exchange: Microsoft, Apple, IBM, Bank of America, Walmart,
and P&G. The US stock market was chosen because of the necessity of operating under the
Effective Market Hypothesis, which more likely to be true in the developed market, rather than
the emerging one.
Final series of predictive models will be constructed in Gretl, but in this case, factors will
be chosen based on the suggestions of IBM Watson Analytics, which automatically determines
drivers of a given variable.
Predictive accuracy of the forecasts generated by aforementioned models will be
estimated by two characteristics: Mean Absolute Percentage Errors, and potential profitability.
The latter characteristic will be assessed through the results of a trading simulation experiment
during which we will imitate real-life trading using all of the models we have constructed.
Chapter 3. Empirical estimation of analytical platforms.
3.1 Evaluation of the Analytical Platforms
3.1.1 Justification of the choice of analytical platforms taken for consideration
According to the Gartner Magic Quadrant for advanced analytical platforms (2016), the
market is divided into four categories: Leader, Challengers, Visionaries, and Niche players. This
classification is based on their abilities to execute (performance metrics) and their completeness
of vision, which could be interpreted as future perspectives. See figure 3.
26
There are several vendors and analytical products in each category. In this research we
focus our attention on the leaders and visionaries. Among the market leaders are IBM, SAS,
Dell, KNIME, and RapidMiner. Visionaries are represented by Microsoft, Alteryx, Alpine Data,
and Predixion Software.
Another research – Forrester Wave (2015), suggest a different picture, based on the
current performance and strategy: IBM and SAS remain the leaders. However, KNIME,
RapidMiner and Dell are removed from the leaders section and ranked as strong performers. See
figure 4.
According to both reports, IBM is the market leader: it shares its place with SAS, but
according to Forrester Wave (2016), it has better perspective for future (higher strategy rank).
KNIME and RapidMiner occupy similar positions in both rankings, also they offer similar
approaches to analytics – both are available offline and provide clients with cost-benefit ratio, as
it states Piatetsky (2016). So, we will chose for platforms for further analysis: IBM Wastson,
SAS, KNIME, and RapidMiner. Dell is set aside, because it is noticeably behind other leaders.
Figure 3. Gartner’s Magic Quadrant for Advanced Analytical platforms 2016
27
Figure 4. Forrester Wave 2015
3.1.2
Results of Evaluation of Analytical Platforms
We will apply simple Analytical Hierarchy Process Method, using BPMSG AHP Online
System (http://bpmsg.com/academic/ahp.php). Goal of AHP is to choose most appropriate
analytical platform for stock price forecasting. According to Gartner’s Magic Quadrant for
Advanced Analytics (09 February 2016, ID: G00275788) the key alternatives are IBM Watson,
SAS, KNIME, and RapidMiner. We have two main criteria: User-friendliness and Range of
Analytical Capabilities. These criteria could be broken down into sub-criteria as follows:
1. User-Friendliness: Visualization, and Simplicity of Use.
2. Range of Analytical Capabilities: Predictive Analytics, Econometric Modeling,
Textual Analytics, and Social Media Analytics.
The relative weights of these criteria were determined through a series of pairwise
comparisons with each other. The comparisons were made as follows:
1. Range of Analytical Capabilities is more important than User-Friendliness.
2. Simplicity of use is more important than Visualization.
3. Predictive Analytics is equally important to Econometric Modeling; Predictive
Analytics is more important than Textual Analytics and Social Media Analytics;
Econometric Modeling is more important than Textual Analytics and Social
28
Media Analytics; Textual Analytics is equally important to Social Media
Analytics.
Description of these criteria is presented in the Table 1. And results of this pairwise
comparison in BPMSG AHP Online System are presented in the Table 2.
Table 1. Description of the criteria.
Criteria
Description
Visualization
Refers to quality of data and analysis
visualization, provided by a platform.
requirements of IT and statistical expertise.
Simplicity of Use
Refers to the requirements of IT and statistical
expertise.
Predictive Analytics
The ability to suggest predictive factors.
Econometric Modeling
Range of the statistical and econometrical tools,
which a platform provides with.
Textual Analytics
Reflects the range of textual analytics
techniques, provided by a platform.
Social Media Analytics
Reflects mostly the range of social media and
news sources, which a platform is capable of
analyzing
Table 2. Decision Hierarchy
Level 0
Level 1
Level 2
Global Priorities
Visualization
11.1 %
User-Friendliness
Simplicity if Use
22.2 %
Predictive
Analytics
22.2 %
Analytical
Econometric
Platform
Range of Analytical
Modeling
22.2 %
Capabilities
Textual Analytics
11.1 %
Social Media
Analytics
11.1 %
Our next step is to evaluate the alternatives using these criteria. The results of the
evaluation are shown in the Table 3.
Table 3. Evaluation of the Platforms (sources: Bloor Group, KMIME Documentation,
SAS Product Documentation, and RapidMiner Documentation).
Visualization
Platform
Priority
SAS
39.5%
IBM Watson
27.8%
Comments
Rank
1
2
This ranking is based on the how well Visualization
is integrated into the analytical process, and what
29
KNIME
16.3%
3
16.3%
Simplicity
Platform
Priority
IBM Watson
39.5%
SAS
27.8%
KNIME
16.3%
3
RapidMiner
RapidMiner
16.3%
Rank
1
2
3
3
Predictive Analytics
Platform
Priority
Rank
IBM Watson
30.0%
1
SAS
30.0%
1
KNIME
20.0%
2
RapidMiner
20.0%
IBM occupies the first place, because it doesn’t
require deep statistical expertise from the user, and
offers simple interface. The reason why both
KNIME and RapidMiner hold 3rd rank, is that they
require some level of statistic expertise, and have
more complicated interface.
Both IBM Watson and KNIME directly states the
predictive analytics function.
2
Econometric Modeling
Platform
Priority
Rank
SAS
28.6%
1
KNIME
28.6%
1
RapidMiner
28.6%
1
IBM Watson
14.3%
2
Textual Analytics
Platform
Priority
Rank
IBM Watson
40.0%
1
SAS
20.0%
2
KNIME
20.0%
2
RapidMiner
20.0%
2
Social Media Analytics
Platform
Priority
Rank
IBM Watson
28.6%
1
SAS
28.6%
1
KNIME
28.6%
1
RapidMiner
14.3%
2
After evaluating the alternatives
All platforms except for IBM Watson offers broad
range of econometrical and statistical models, while
IBM Watson has replaced It with Data Exploration
and Predictive functions.
All of the platforms offers textual analytics
functions, but IBM Watson is the only one capable of
answering questions, formulated in natural language.
All platforms have social analytics functions;
however, RapidMiner can analyze only twitter.
(IBM Watson, SAS, KNIME, and RapidMiner) in the
BPMSG AHP Online System, we have the results, which are presented in the Table 4.
Table 4. Ranking of Analytical platforms.
Priority
Rank
IBM Watson 29.4%
1
SAS
29.0 %
2
KNIME
21.6%
3
RapidMiner
20.0%
4
As we can see it in the Table 3. IBM and SAS are almost similar in the regard of
suitability for stock price forecasting, according to the AHP method. Overall, the result is
consistent with Gartner’s Magic Quadrant and Forrester Wave. However, IBM Watson has
30
scored a bit better, that is why we will use it for our further analysis, which is presented in the
next chapter.
3.2
Evaluation of the forecasting accuracy of IBM Watson Analytics
3.2.1 Data description
In the Table 5, we can see a description of the data we will use in the stock market
forecasting experiments. Variables are classified into four categories: stock prices, prices of
resources (gold, oil, and natural gas), values of the market indexes, and currency’s exchange
rates. Observations cover the period from 01.30.2015 to 01.04.2016
We will use two types of software to run the predictive modeling: Gretl statistical
package and IBM Watson Analytics. Type of models, which is marked as IBM+Gretl in the
Table 7, was build in a steps as follows: after uploading the dataset to IBM Watson Analytics, the
predictive function was applied. It has suggested predicting factors for each target variable
(stocks and currency’s exchange rates), after that, simple two-factor regression models were
build in the SPSS, using suggested by the IBM Watson Analytics predictive factors as
independent variables. The random walk models are basically just ARIMA (0,1,0) models. They
will be used just as a basis for comparison.
Table 5.
Data
description
(Source:
Finam)XSo
ftware
Model
Random Walk
Models
Gretl
One-Factor
models
CAPM
Variables
Number of observations
ERO/USD
USD/CAD
USD/YEN
USD/ZAR
USD/NOR
USD/CNY
USD/RUB
USD/NOR
USD/ZAR
USD/RUB
BRENT
Gold
S&P 500
BAC
IBM
MSFT
P&G
Walmart
Apple
429
426
428
426
425
393
426
363
286
31
IBM+Gretl
Two-Factor
models
S&P 500
DJI
RTS
Nikkei
CSI
FTSE
Shanghai
NASDAQ
Gold
Natural Gas
Brent
ERO/USD
USD/CAD
USD/YEN
USD/ZAR
USD/NOR
USD/CNY
USD/RUB
Exxon Mobil
Chevron
BAC
IBM
P&G
Walmart
Apple
225
One-factor models are predicting the currency’s exchange rates based on the prices of the
most exported commodities (oil, and gold). CAPMs predict the prices of the stock. It was build
using week prices of the “blue chips” of the US stock market. Role of the average market
indicator was played by the S&P 500 index. Interest rate of the 4 week reassure bills was used as
the risk free rate (Rfr=2%). Return on assets is calculated as the difference between stock’s price
in moment t and stock’s price in moment t-1, divided by the stock price in the moment t-1:
R=
Pt −P t −1
(12)
P t−1
Specifications of the models are shown in the Appendix 1 and Appendix 2.
3.2.2 Forecasting stock prices with theoretically based models.
3.2.2.1Results of the Random walk models for currencies.
Random walk model is the basis for comparison for any other forecasting model, as any
predictive model makes sense only if it beats the random walk. Using Gretl statistical and
econometrical package, we have built ARIMA(0,1,0) time series models, which are equivalent to
32
the simple Random walk. In the Table 6, we can see the error metrics for the random walk model
for currencies exchange rates.
Table 6. Percentage errors of Random Walk models
Model
EUR/USD
USD/CAD
USD/NOK
USD/RUB
USD/ZAR
USD/CNY
USD/JPY
As we can see in the Table 2,
MPE
MAPE
0.004209
0.004209
-0.0011137
0.056509
-0.077933
0,077933
-0.27649
0,27649
8.1111
8.1111
-0.0089793
0.0089793
-0.032764
0.032764
random walk models have produced quite small mean
percentage errors and mean absolute percentage errors, with the exception for USD to South
African Rand Exchange rate (ZAR). It might give an impression that random walk performs
greatly, however, as it is supported by Elliot, G (2013), for the purposes of profiting from the
differences in the exchange rates, it’s more important to foresee the direction of change, rather
than to give more accurate estimation. Low percentage error in Random Walk case could be
caused by the fact that the forecasted value differs from the previous observation only by small
random value.
Our next step is to estimate potential profitability of trading main currencies using
random walk model. For that purpose, we have run the simulation test in Excel 2013, using rules
as follows: if the investor expects appreciation of the asset, then he buys it, and vice versa. The
results are shown in the Table 7. We have used 30 last forecasted values of each currency’s
exchange rates, for an imitation of real life trading.
Table 7. Results of the simulation of Random Walk
Trading simulation (Random Walk)
Model
Profitability
EUR/USD
0,39%
USD/CAD
-6,20%
USD/NOK
-1,38%
USD/RUB
-17,45%
USD/ZAR
-0,46%
USD/CNY
0,96%
USD/JPY
1,64%
As expected, results of the simulation reveal that Random walk model is absolutely unfit
for trading, in 4 out of 7 cases, the profitability is negative, especially in case of Ruble, which
33
has shown over -17% loses. Even positive examples have very low profitability. The average
return is -3.2%, and if it were real life trading, than the loses would be even bigger, as there are
transaction costs and time lags. Thus, it is safe to conclude that random walk model is
completely unfit for real life application.
3.2.2.2 Currency’s exchange rates forecasting using factor models.
Table 8 presents the description of factor models. In accordance with Dominico, F. (2015)
we have built predictive model for currency’s exchange rate forecasting using prices of mostly
traded commodities as predictors. Models were built in Gretl econometrical package using
“ordinary least square” option.
Table 8. Description of factor models for currencies .
Model
Model’s Parameters
Coefficient
Sig,
Model's Statistic
R-squared
MPE
MAPE
0,79542
-3,6611
3,6611
USD/NOK
const
10,1061 <0,0001
Brent
−0,039167
<0,0001
USD/RUB
const
83,0842 <0,0001
Brent
−0,48308
<0,0001
USD/ZAR
const
26,4943 <0,0001
Gold
−0,011423
0,0003
USD/CAD
const
1,4658
<0,0001
−0,0041481
Brent
<0,0001
2
All factors are statistically significant
0,770869
4,0127
4,0127
0,169913
14,503
14,503
0,771673
-0,24727
4,19840
and they have expected influence on every
currency (the higher the price of the commodity, the lower USD exchange rate). However, these
models demonstrate bigger mean percentage errors than the random walk. In that sense, they
don’t beat the random walk.
Three out of four models have high R-squared (>0.7), which implies good explanatory
power of models. The only exception is USD/ZAR model, which has very low R-squared
(=0.169) and the highest Mean Absolute Percentage Error (14%). This result leads us to the
thoughts that, gold isn’t the main export product in South Africa anymore.
Our next step is to estimate potential profitability of trading main currencies using simple
one factor regression. For that purpose, we have run the simulation test in Excel 2013. We have
used 30 last forecasted values of each currency’s exchange rates, for an imitation of real life
trading.
34
As it is shown in the Table 9, trading with factor models brings way higher returns, that
just random walk, because factor models manage to generate more accurate predictions of the
direction of price’s change. Average return for this model is 26%.
Table 9. Results of the simulation of the factor models.
Trading simulation (Factor regression)
Model
Factors
Profitability
USD/CAD
Brent
0,2151143
USD/RUB
Brent
0,3132015
USD/ZAR
Gold
0,2284068
USD/NOK
Brent
0,2875645
3.2.2.3 Stock forecasting using CAPM model.
Using “ordinary least square” function in Gretl statistical package, we have built CAPM
for every of stocks as follows: Apple, IBM, Microsoft, Procter & Gamble, Walmart, and Bank of
America. As a factor we have used the risk premium:
RP=( μM −R 0 ) (13)
Where
μM
is return on S&P index, and
R 0 is four weeks treasure bill interest rate.
As we can see it in the Table 10, CAPM model produce quite poor results both in terms
of explanatory power (low R-squared) and accuracy of forecasts, sometimes mean percentage
errors exceed 100% (Average MAPE = 177%), meaning that the forecasts is radically different
with the reality. Despite the fact that in all cases, risk premium as a factor was significant, and Rsquared is tolerable (except for Walmart case), the models appear to be unfit for the actual
forecasting. Because of huge deviations of forecasted values from the actual ones.
Table 10. Description of CAPM for stocks.
Model
Bank Of America
const
RP
Microsoft
const
RP
Apple
const
SP
Walmart
const
Model Parameters
Coefficient
Sig.
0.000181202
1.32574
Model's Statistic
R-squared
MPE
MAPE
0.549330
574.13
574.13
<0.0001
<0.0001
0.464251
0.00130152
1.13318
0.0833
<0.0001
−0.00068693
1.14488
<0.0001
<0.0001
0.214836
−0.0015747
202.74
258.91
59.562
59.562
0.0246
35
RP
0.592545
<0.0001
IBM
const
RP
0.453587
−0.000336033
0.919689
72.152
72.152
<0.0001
<0.0001
P&G
0.459370
66.181
66.181
const
−0.000790775 0.0787
RP
0.671332
<0.0001
Our next step is to estimate potential profitability of trading blue chips stocks using
CAPM model. For that purpose, we have run the simulation test in Excel 2013. We have used 30
last forecasted values of each currency’s exchange rates, for an imitation of real life trading.
Results of the trading simulation (Table 13) confirms the point that CAPM is unfit for
stock market forecasting. CAPM has generated significant potential outcome only in 2 out of 6
cases, in two cases, the results were negative, and the last two have demonstrated negligible
profits, which would not even cover transactional costs. Average return is 5%, which
demonstrates that despite huge deviations of forecasted values from actual ones, in some cases
CAPM still correctly predict the direction of change.
Table 11. Results of the CAPM simulation.
Model
Profitability
BAC
0,1774171
IBM
0,19201815
MSFT
0,03278492
P&G
-0,0342963
Walmart
0,01295297
Apple
-0,0838422
3.2.3 Forecasting stock market using IBM Watson analytics.
3.2.3.1 Models for stock forecasting.
We have used free version of IBM Watson analytics to conduct our experiment. After
uploading our dataset consisting of 26 variables, IBM Watson Predict option has automatically
processed and analyzed uploaded data. The result is a set of suggested predictive factors that
drive any given variable. Based on the predictive power of the model, estimated by Watson
Analytics, we have chosen the most promising ones. Forecasting of stock prices and currencies
exchange rates using IBM Watson will be done using IBM Watson analytics “Predict” function
in two steps:
1. Choosing factors, which IBM Watson Analytics Suggest as the best predictors
2. Building two factor regression using Ordinary Least Square method in Gretl statistical
package
36
In the Table 12. We can see which variables were chosen as predictors, and which were
chosen as targets. Results of applying the IBM Watson analytics predict function are shown in
the Appendix 4.
Table 12.
Targets
Prices of stock
12. Exxon Mobil
13. Chevron
14. BAC
15. IBM
16. P&G
17. Walmart
18. Apple
Input
Stock Indices & resource prices
1. S&P 500
2. DJI
3. RTS
4. Nikkei
5. CSI
6. FTSE
7. Shanghai
8. NASDAQ
9. Gold
10. Natural Gas
11. Brent
Using suggested drivers of predicted values, we have built regression models in Gretl
statistical package for each of the observed currency’s exchange rate. The results are presented in
the Table 13.
Table 13. Description of models built based on IBM Watson.
Model
Exxon Mobil 1
const
Gold
Futsee
Exxon Mobil 2
const
DJI
Gold
Exxon Mobil 3
const
SP 500
Gold
Model Parameters
Coefficient
Sig.
−4.57266
0.00725082
0.0339528
−46.8102
0.0405594
0.00467673
−53.5047
0.0398747
0.0463417
IBM 1
const
Brent
Coefficient
0.827575
-0.05769
1.8791
85.2
0.762696
-0.079892
2.2036
83.6
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
Sig.
Model's Statistic
R-square
MPE
MAPE
0.874521
78.6593
1.19685
Predictive
Power (%)
87.4
0.0226
<0.0001
<0.0001
Model Parameters
Model
Model's Statistic
R-square
MPE
MAPE
0.712977
-0.10002
2.3478
11.277
11.277
Predictive
Power (%)
93,00
<0.0001
<0.0001
37
NKK225
IBM 2
const
NASDAQ100
USDZAR
P&G
const
USDJPY
Brent
Bank of
America
const
Natural Gas
NASDAQ100
Apple 1
const
DJI
USDCNY
Apple 2
const
Brent
NASDAQ100
Walmart 1
const
USDZAR
NKK225
Walmart 2
const
NKK225
Footse100
Walmart 3
const
Brent
USDJPY
Chevron 1
const
Gold
Footse100
Model
Chevron 2
const
NASDAQ100
USDZAR
Coke 1
const
0.000636722
241.98
0.00448579
−8.25771
151.348
−0.683958
0.197202
0.0640
0.817450
7,5400
7,5400
92,30
0.305893
-0.19997
3.199
82.4
-0.20033
3.493
0.928082
-0.066927
2.0298
93,7
0.934771
1,2012
1,2012
96,30
0.627249
-1.5062
1,5062
94,50
0.784290
-1.4992
1,4992
93,30
0.601377
-1.7314
1,7314
91,30
0.754404
-0.29461
4.295
89.3
<0.0001
0.0916
<0.0001
<0.0001
<0.0001
<0.0001
0.808380
−19.479
2.11465
0.00683298
<0.0001
<0.0001
<0.0001
390.097
0.00533233
−57.7748
<0.0001
<0.0001
<0.0001
−9.95716
0.879771
0.0186525
0.0590
<0.0001
<0.0001
191.434
−5.50512
−0.00250334
<0.0001
<0.0001
<0.0001
−33.1095
−0.00304983
0.0248156
<0.0001
<0.0001
<0.0001
151.821
0.695071
−0.971299
<0.0001
<0.0001
<0.0001
−76.5478
0.0393618
0.0192772
<0.0001
<0.0001
<0.0001
Model Parameters
Coefficient
Sig.
119.11
0.0063668
−3.97567
Model's Statistic
R-square
MPE
MAPE
0.425434
-0.6588
6.4729
Predictive
Power (%)
88.7
<0.0001
0.0502
<0.0001
0.607398
43.2875
90.7
-0.0641
2.0372
79.9
<0.0001
38
Gold
0.00589016
Natural Gas
−3.42148
Coke 2
const
−18.4188
USDCNY
7.84844
Gold
0.00890342
Analyzing the results, we
<0.0001
<0.0001
0.407077
7,5483
7,5483
82,3
0.0002
<0.0001
<0.0001
can see that three out of thirteen models (IBM 1, IBM 2, and
Chevron 2) turned out to be statistically insignificant. That strange result could be explained by
the fact that some potentially important predictors were not included in the uploaded dataset.
IBM Watson just didn’t have enough data to generate good models for these stocks.
R- squared is high or at least tolerable in all cases with the exception for P&G.
Additionally, there are two models with a borderline explanatory power – Coke 2 and Chevron 2,
R-squared equals 0.407 and 0.425 respectively. Mean percentage errors are quite low, but still
they are higher than that of a random walk model.
As a next step, we have estimated potential profitability of trading stocks using regression
models, with factors suggested by IBM Watson analytics. For that purpose, we have run the
simulation test in Excel 2013. The results are shown in the Table 14. We have used 30 last
forecasted values of each currency’s exchange rates, for an imitation of real life trading.
Table 14. Results of the Simulation of IBM predictive models.
Trading simulation (Factor regression)
Model
Factors
DJI
Apple 1
USDCNY
Brent
Apple 2
Nasdaq 100
Gold
Exxon Mobil 1
Futsee 100
Gold
Exxon Mobil 2
DJI
SP 500
Exxon Mobil 3
Gold
Brent
IBM 1
NKK225
Trading simulation (Factor regression)
Model
Factors
NASDAQ100
IBM 2
USDZAR
USD/JPY
PG
Brent
Natural Gas
Bank of America
NASDAQ100
Futsee 100
Chevron 1
Gold
Chevron 2
NASDAQ100
Profitability
0,4529254
0,426352
0,0992644
0,4689172
0,4479888
0,1694079
Profitability
0,1816904
-0,0611271
0,3798725
0,1907649
0,5317324
39
Walmart 1
Walmart 2
Walmart 3
Coke 1
Coke 2
USDZAR
USDZAR
NKK 225
NKK 225
Futsee 100
Brent
USD/JPY
Gold
Natural Gas
Gold
USD/CNY
0,0200946
0,2398046
-0,0270778
0,0003398
-0,1652693
We have ambivalent results, on one hand; some of the models have demonstrated superior
results during the simulation (Apple 1, Apple 2, Exxon Mobil 2, Exxon Mobil 3, and Bank of
America), but on the other hand, three models have demonstrated negative result (P&G, Walmart
3, and Coke 2), and one has shown negligibly small profitability (Walmart 1). The lowest results
were demonstrated by those models, which turned out to be insignificant (IBM 1, IBM 2). As it
was mentioned before, the reason for these results could be absence of some important factors in
the dataset.
Overall, IBM Watson generated models have shown results that exceed any other in terms
of potential profitability. Average return is 20%, which is way better than that of CAPM.
However, there is a problem of separating profitable models from unprofitable ones, and the
stability of the desirable performance over the time is still in question.
3.2.3.2 Models for currency’s exchange rate forecasting.
We have used free version of IBM Watson analytics to conduct our experiment. After
uploading our dataset consisting of 26 variables, IBM Watson Predict option has automatically
processed and analyzed uploaded data. The result is a set of suggested predictive factors that
drive any given variable. Based on the predictive power of the model, estimated by Watson
Analytics, we have chosen the most promising ones. Forecasting of stock prices and currencies
exchange rates using IBM Watson will be done using IBM Watson analytics “Predict” function
in two steps:
3. Choosing factors, which IBM Watson Analytics Suggest as the best predictors
4. Building two factor regression using Ordinary Least Square method in Gretl statistical
package
In the Table 15. We can see which variables were chosen as predictors, and which were
chosen as targets. Results of applying the IBM Watson analytics predict function are shown in
the Appendix 3.
Table 15. IBM Watson for currencies (source of data: Finam).
40
Targets
Currencies
1. USD/CAD
2. USD/YEN
3. USD/ZAR
4. USD/NOR
5. USD/CNY
6. USD/RUB
7. ERO/USD
Input
Prices of stock
12. Exxon Mobil
13. Chevron
14. BAC
15. IBM
16. P&G
17. Walmart
18. Apple
Stock Indices & resource prices
1. S&P 500
2. DJI
3. RTS
4. Nikkei
5. CSI
6. FTSE
7. Shanghai
8. NASDAQ
9. Gold
10. Natural Gas
11. Brent
Example: choosing target variables and input variables IBM Watson Analytics displays
results, as it shown on the Figure 5.
Figure 5. Screenshot of Watson Analytics Predictive function results.
Colored circles represent combination of two predictive factors (Stock Indices, Stock
Prices, or Prices of the resources). The closes a circle is to the core, the higher is the predictive
power. Using suggested drivers of predicted values, we have built two-factor regression models
in Gretl statistical package for each of the observed stock.
Using suggested drivers of predicted values, we have built regression models in Gretl
statistical package for each of the observed currency’s exchange rate. The results are presented in
the Table 16.
Table 16. Description of currency’s exchange rate models.
Model
Model Parameters
Coefficient
Sig,
Model’s Statistic
R-squared
MPE
MAPE
Predictive
Power (%)
41
EUR/USD 1
const
Gold
PG
USD/CNY 1
const
Brent
Shanghai
USD/CNY 2
const
Brent
NKK225
USD/JPY 1
const
BankAmerica
Gold
USD/JPY 2
const
Gold
NKK225
USD/NOK 1
const
Gold
Natural Gas
USD/NOK 2
const
Natural Gas
Brent
USD/ZAR
const
Natural Gas
Brent
USD/RUB
const
Brent
Shanghai
1,02361
0,000299935
−0,00339848
<0,0001
<0,0001
<0,0001
6,94919
−0,0110162
−1,60684e-05
<0,0001
<0,0001
0,0335
7,22139
−0,00990401
−2,03678e-05
103,254
1,73662
−0,00945173
0,4096
2,8818
2,8818
63,40
0,8981
0,0547
0,0547
96,00
0,9198
-0,4768
0,4768
95,70
0,8242
-3,1761
3,1761
93,50
0,8642
-1,8067
1,8067
92,50
0,8408
-0,9656
0,9656
90,90
0,8196
-3,5709
3,5709
93,30
0,9076
-3,6134
3,6134
95,60
0,9182
-5,8449
5,8449
94,50
<0,0001
<0,0001
<0,0001
<0,0001
<0,0001
<0,0001
109,078
−0,020616
0,00184703
<0,0001
<0,0001
<0,0001
14,2993
−0,00358985
−0,776106
<0,0001
<0,0001
<0,0001
9,92649
−0,0331292
−0,0323285
<0,0001
0,0515
<0,0001
21,3444
−0,977925
−0,108871
<0,0001
<0,0001
<0,0001
104,251
−0,564791
−0,00349858
<0,0001
<0,0001
<0,0001
As we can see it in the Table 16, all of the models are statistically significant and have
high values of R – squared, with the exception for the Euro to USD exchange rate model. In
terms of percentage errors, models still are not capable of beating the Random Walk.
As a next step, we have estimated potential profitability of trading currencies using
regression models, with factors suggested by IBM Watson analytics. For that purpose, we have
run the simulation test in Excel 2013. We have used 30 last forecasted values of each currency’s
exchange rates, for an imitation of real life trading.
42
Results of the simulation tests are shown in the Table 17. In all cases except for Euro to
USD, models were able to produce positive results, but the profitability is much lower than that
of the stock predicting models (10% vs. 26%), this result is quite surprising. It once again raises
question of stability of performance of econometrical models.
Table 17. Results of the simulation for currencies
Trading simulation
Model
Factors
EUR/USD
USD/CNY 1
USD/CNY 2
USD/JPY 1
USD/JPY 2
USD/NOK 1
USD/NOK 2
USD/ZAR
Gold
PG
Brent
Shanghai
Brent
NKK225
BankAmer
ica
Gold
Gold
NKK225
Gold
Natural
Gas
Natural
Gas
Brent
Natural
Gas
Brent
Brent
Profitabil
ity
0,068763
67
0,013644
8
0,005544
67
0,132383
44
0,120075
05
0,016063
9
0,086062
94
0,125849
94
0,399698
93
3.2.3.3 Analysis of the results of stock price forecasting.
USD/RUB
Shanghai
We have built a series of predictive models for stock price forecasting and currency
exchange rate forecasting. First series were based on the Random Walk model. It was chosen as
basis for comparison with other models, as it necessary for any predictive model to outperform
random model in order to make at least some sense.
Random walk models have shown unbeatably small deviations of forecasted values from
the actual ones, but the random walk model fails to correctly predict the direction of change,
therefore it is completely unfit for the purposes of trading. Another type of currency’s exchange
rate forecasting model we employed is one factor regression, which uses price of the most
exported commodity as a predictor. In terms of deviations of forecasts from actual values, they
failed to beat the Random Walk, but in terms of potential profitability, as it was demonstrated by
43
the simulation, they easily outperformed the Random Walk, by demonstrating returns on the
level of 20-30%.
Next models we built are CAPM models for the “blue chips” with the index S&P 500 as
an average market asset. CAPM has shown poor results in terms of both forecasting accuracy
and potential profitability. Its deviation from actual values sometimes exceeded 100%, and only
one model has shown substantial returns during the simulation.
Series of stock predictive models based on the suggestions of IBM Watson Analytics
have demonstrated results, which are superior to all other models. In terms of forecasting
accuracy, they beat all models except for the Random Walk. Additionally, the simulation has
demonstrated high returns for most of the suggested models, with the exception for four models
with negative and unsubstantial returns. Results of currency’s exchange rate forecasting using
IBM Watson were worse than that of a simple one-factor regression models, it still beats the
Random Walk in potential profitability. It raises the question of spurious correlation between the
variables.
Overall, IBM Watson Analytics is capable of suggesting effective predictive models.
However, it doesn’t provide users with detailed description of the nature of the interdependencies
between the variables. It requires further analysis in order to compute actual forecast of the
variables in question.
3.3 Conclusion of the Chapter 3.
In the Chapter 3, we have identified four analytical platforms of interest, based on the
Gartner’s Magic Quadrant for Advanced Analytical Platforms 2016 and Forrester Wave 2015:
IBM Watson Analytics, SAS analytics, KNIME, and RapidMiner. The main factor, which has
determined such choice, is that they were identified as market leaders, strong performers, and
visionaries, with the biggest potential for growth.
We have evaluated the analytical platforms using Analytical Hierarchy process with a set
of six KPI’s: Visualization, Simplicity of Use, Predictive Analytics Capabilities, Econometric
Modeling capabilities, Textual analytics capabilities, and Social Media analytics Capabilities.
The results has shown the most preferable analytical platforms for stock price forecasting are
IBM Watson and SAS analytics. However, IBM scored a bit better, so we chose it as an
analytical platform of choice for stock market forecasting.
Then we have dwelled deeper into how IBM Watson Analytics could be combined with
statistical packages. For that purpose, we have built a set of models: theoretically based and
those suggested by Watson Analytics. Series of stock predictive models based on the suggestions
of IBM Watson Analytics have demonstrated results, which are superior to all other models. In
44
terms of forecasting accuracy, they beat all models except for the Random Walk. Additionally,
the simulation has demonstrated high returns for most of the suggested models, with the
exception for four models with negative and unsubstantial returns. Results of currency’s
exchange rate forecasting using IBM Watson were worse than that of a simple one-factor
regression models, it still beats the Random Walk in potential profitability. It raises the question
of spurious correlation between the variables.
Overall, IBM Watson Analytics is capable of suggesting effective predictive models.
However, it doesn’t provide users with detailed description of the nature of the interdependencies
between the variables. It requires further analysis in order to compute actual forecast of the
variables in question.
Final Conclusions
Discussion of the findings.
In the course of this research, we have completed the set of objectives, stated in the
Research Framework Chapter.
First of all, we have evaluated four analytical platforms of interest, based on the Gartner’s
Magic Quadrant for Advanced Analytical Platforms 2016 and Forrester Wave 2015: IBM Watson
Analytics, SAS analytics, KNIME, and RapidMiner. The main factor, which has determined such
choice, is that they were identified as market leaders, strong performers, and visionaries, with the
biggest potential for growth.
Our next objectives were the evaluation and comparison of the analytical platforms based
on their ability to generate predictive models for stock price forecasting.
For the purposes of the evaluation, we have used a set of six KPI’s: Visualization,
Simplicity of Use, Predictive Analytics Capabilities, Econometric Modeling capabilities, Textual
analytics capabilities, and Social Media analytics Capabilities. The result of applying Analytical
Hierarchy method has demonstrated that IBM Watson and SAS Analytics are the most
appropriate tools, when it comes to forecasting stock market. The whole ranking is shown in the
Table 18.
Table 18 . Ranking of Analytical platforms.
Analytical platform
IBM Watson
SAS
KNIME
RapidMiner
Priority
29.4%
29.0 %
21.6%
20.0%
Ran
k
1
2
3
4
45
IBM Watson Analytics has won SAS analytics only by a hair. IBM Watson beats SAS at
simplicity of use, but SAS wins when it comes to the range of econometrical and statistical tools,
which it offers to users. The ability to suggest predictive factors, without preliminary analysis, is
what distinguish IBW Watson and SAS from the others. They are superior in their abilities to
conduct Predictive Analytics process, while other platforms require statistical expertise in order
to use them to full extend.
Our final objectives were to construct, evaluate and compare the results of theoretically
based econometric predictive models, and IBM Watson Analytics suggested models. The results
has shown that in terms of deviations of forecasts from the actual values of observed variables
(measured in terms of Mean Absolute Percentage Errors), the Random Walk is unbeatable.
However, when it comes to the potential profitability of the models (assessed trough trading
simulation), theoretically based models has shown worse results, that IBM Watson Analytics
suggested models, with the exception of the models, based on the prices of most exported
commodities. This result could be explained by the fact, that IBM Watson Analytics didn’t
specify the nature of interdependencies between the variables, meaning that further analysis is
required in order to determine the exact econometric equation.
Overall, the effectiveness of IBM Watson Analytics as an effective tool for predictive
models suggestion was confirmed.
To sum up, we provide direct answers to the Research questions, as it is shown in the
Table 20.
Table 20. Research Questions and answers
Research question
Answer
Which analytical platforms is a better fit for the
purposes of stock market forecasting?
IBW Watson Analytics and SAS.
Does IBM Watson Analytics suggest effective
predictive models for stock forecasting, in
comparison with standard theoretically based
econometric models?
Yes, IBM Watson Analytics suggest
effective predictive models, however,
further analysis is required in order to build
the most effective predictive model.
Theoretical implications.
1. Using the theoretical part of this work, similar researches of niche analytical platforms
(according to Gartner’s Magic quadrant of advanced analytical platforms), such as Prognoz,
Accenture, Fico, Megaputer, and Levastorm could be conducted.
2. The research provides a ground for further studies of how different analytical platforms
and analytical software tools could be combined in order to construct predictive models.
3. The research can serve as a base for further studies of how big data challenges in
financial sector could be tackled using analytical platforms.
46
4. There are some collateral theoretical results: the theory that currency’s exchange rate
could be effectively predicted using the price of the mostly exported commodities was
confirmed, however, is models have limited applicability, since they could predict exchange rates
only for those currencies which are strongly connected to one particular commodity. In other
words, it applies only to resource exporting economies.
5. The inability of CAPM to adequately predict stock prices even on the developed stock
market was confirmed, therefore the Effective Market Hypothesis is not met on the US stock
market.
6. The research has both confirmed and questioned the unbeatable random walk: in terms of
the deviation measures, the random walk remains unbeatable, but from the perspective of
forecasting the direction of change, it is outperformed by both theoretically based models, and by
those that were suggested by IBM Watson Analytics.
Managerial implications.
1. This research provides interested parties (traders) with the recommendations regarding
which analytical platforms to use for the purposes of stock price forecasting.
2. The research provides individual traders with tight budget constraints with the no costs
combination of analytical platforms (IBM Watson Analytics as a guide, and Statistical Package
(Gretl) for the construction of the final model). This combination could prove to be quite
effective, since IBM Watson Analytics is the only tool which is capable of suggesting predictive
models without preliminary theoretical work.
3. The study has identified the analytical functions, which analytical platform should be able
to perform in order to address the business tasks of the financial organizations.
4. The study provides with the criteria, using which analytical platforms can be chosen.
5. The study has contributed to the analysis of the market of financial analytics.
Limitations.
1. Analytical Hierarchy Process imbeds some level of subjectivity: pairwise comparison of
the criteria and alternatives could vary depending on the expert.
2. Only four out of many three analytical platforms were chosen.
3. This study was conducted with the use of open source data gathered from the Finam
website. Access to the more possible variables harness the possibility for Watson Analytics to
generate better predictive models.
4. All predictive models were estimated under the assumption that investor has real time
access to all needed information and can react instantly, in accordance with chosen model.
47
5. Finally, our simulations were run under the assumption that an investor has instant access
to all information, needed for the model building, and that an investor can strike deals instantly,
before the market reacts on the changes.
List of references
1.
(2012, June 26). Small and midsize companies look to make big gains with “big data,”
according to recent poll conducted on behalf of SAP.
2.
Abdullah, L., J. Sunadia, and T. Imran. 2013. Ranking of human capital indicators using
analytical hierarchy process. Paper presented on Evaluation of Learning for Performance
Improvement International Conference Malaysia, 25 – 26 February, 2013
3.
Antweiler, W. and Frank, M.Z. 2004. Is all that talk just noise? The information content
of internet stock message boards. Journal of Finance 59 (3): 1259-1294.
4.
Bologa, A., R., Bologa, and A., Florea. 2010. Big Data and Specific Analysis Methods for
Insurance Fraud Detection. Database Systems Journal 4 (4): 30-39
5.
Cao, M., R., Chychyla, and Stewart, T. 2015. Big Data Analytics in Financial Statement
Audit. Accounting Horizons 29 (2): 423-429
6.
Chung, W. 2014. BizPro: Extracting and categorizing business intelligence factors from
textual news articles. International Journal of Information Management 34(2): 272–284.
7.
Cukier K., 2013. The Economist, Data, data everywhere: A special report on managing
information. February 25. Retrieved from http://www.economist.com/node/15557443
8.
Curthberston, K. 1996 Quantitative Financial Economics. New York. John Willey &
Sons Inc.
9.
Domenico, F., Kenneth R., and Barbara, R. 2015. Can oil prices forecast exchange rates?
An empirical analysis of the relationship between commodity prices and exchange rates.
Journal of International Money and Finance 54: 116-141
10. Doug, H. Gartner Advanced Analytics Quadrant 2015: Gainers, Losers
11. Earley. E. 2015. Data analytics in auditing: Opportunities and challenges. Business
Horizons 58: 493—500
12. Elliot, G., and A. Timmermann. 2013. Handbook of Economic Forecasting. Elsevier
Science and Technology Books, Inc.
13. Fan, J., Han, F., and Liu, H. (2014). Challenges of big data analysis. National Science
Review 1 (2): 293–314.
14. Finam. http://www.finam.ru/analysis/quotes/?0=&t=8315698
15. Financial Analytics Market - Worldwide Market Forecasts (2013 - 2018). Research and
Markets (http://www.researchandmarkets.com/research/6xj66l/financial)
48
16. Gandomi A., and Murtaza H. 2015. Beyond the hype: Big data concepts, methods, and
analytics. International Journal of Information Management 35: 137–144
17. Gartner IT Glossary (n.d.). Retrieved from http://www.gartner.com/it-glossary/big-data/
18. Gema B., Jason J., and Jung B. 2016. Social big data: Recent achievements and new
challenges. Information Fusion 28: 45–59
19. I B M c o r p o r a t i o n w e b s i t e . (https://www.ibm.com/marketplace/cloud/watsonanalytics/purchase/us/en-us#product-header-top)
20. Imad, M., Kelly, B. 2014. The unbeatable random walk in exchange rate forecasting:
Reality or myth? Journal of Macroeconomics 40: 69–81
21. Jagadish H.V. 2015. Big Data and Science: Myths and Reality. Big Data Research 2: 49–
52
22. Jianzheng L., Jie L., Weifeng L., and Jiansheng W. 2016. Rethinking big data: A review
on the data quality and usage issues. ISPRS Journal of Photogrammetry and Remote Sensing
115: 134–142
23. Joe F., and Hair J. 2007. Knowledge creation in marketing: the role of predictive
analytics. European Business Review. 19 (4): 303 – 315
24. Johan B., Huina M., and Xiaojun Z. 2015. Twitter mood predicts the stock market
Journal of Computational Science 2: 1–8.
25. KNIME website. Documentation. (https://tech.knime.org/documentation)
26. Konishi, S., and Kitagawa, G. 2007. Information Criteria and Statistical Modeling New
York: Springer.
27. Kwan, M. 2014. Big Data's Impact on Trading and Technology. Journal of Trading 9 (1):
54-56
28. Kyunghee Y., and Lucas L. 2015. Big Data as Complementary Audit Evidence.
Accounting Horizons 29: 431-438
29. Mark, E. 2006. Fragile digital data in danger of fading past history’s reach. Atlanta
Journal Constitution, June 7, p. A1.
30. Matlis J. 2006. Predictive Analytics. Computerworld, October 9
31. Meese, R., Rogoff, K. 1983. Empirical exchange rate models of the seventies: do they fit
out of sample? Journal of International Economy 14: 3–24.
32. Microsoft Azure Machine Learning Website. (https://azure.microsoft.com/engb/pricing/details/machine-learning/)
33. Min C., and Stewart R. 2015 Big Data Analytics in Financial Statement Audits.
Accounting Horizons. 29 (2): 423-429
49
34. Niemira, P.M., and G. F. Zukowski. 1998. Trading the Fundamentals. New York. John
Willey & Sons Inc.
35. Overview diagram of Microsoft Azure Machine Learning Capabilities. 2016.
(https://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overviewdiagram/)
36. Pozi, S. 2014. Big Data, Big Opportunities. Best’s Reviews. March.
37. Ramulkan, R. 2015. Financial Executive.
38. RapidMiner Studio Manual. 2014. (http://docs.rapidminer.com/downloads/RapidMinerv6-user-manual.pdf)
39. RapidMiner website. (https://rapidminer.com/products/comparison/)
40. Ruta, D. 2014. Automated trading with machine learning on big data. IEEE Computer
Society: 824-30
41. SAP. Big Data and Smart Trading: How a Real-time Data Platform Maximizes Trading
Opportunities; 2012.
42. SAS Website. Product Documentation. (http://support.sas.com/documentation/index.html)
43. Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., and Tufano, P. 2012.
Analytics: The real-world use of big data. How innovative enterprises extract value from
uncertain data. IBM Institute for Business Value. Retrieved from http://www03.ibm.com/systems/hu/resources/the real word use of big data.pdf
44. Schwager, J. D. 1996. Getting Started in Technical Analysis. New York. John Willey &
Sons Inc.
45. Shmueli, G. 2010. To Explain or to Predict? Statistical Science 25 (3): 289-310.
46. Smith, K. 2015. Big Data Discoveries. Best’s Reviews. November.
47. Spandan G., Soham R., and Satyajit C. 2014. News Analytics and Sentiment Analysis to
Predict Stock Price Trends. International Journal of Computer Science and Information
Technologies 5 (3): 3595-3604
48. Srivastava, U., and Gopalkrishnan S. 2015. Impact of Big Data Analytics on Banking
Sector: Learning for Indian Banks. Procedia Computer Science 50: 643-52.
49. TechAmerica Foundation’s Federal Big Data Commission. 2012. Demystifying big data:
A practical guide to transforming the business of Government. Retrieved from
http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-final.pdf
50. Ventana Research. 2016. Five keys to choosing a comprehensive analytics platform.
51. Ventana Research. 2016. Gaining the edge in Banking with Business Analytics.
52. Ventana Research. 2016. Perspective Business Analytics.
50
53. Wu, H., Harris W., Gongjun Y., Vasudeva Akula C., Jiancheng S. 2015. A novel social
media competitive analytics framework with sentiment benchmarks. Information &
Management 52: 801–812
54. Xinhui T., Rui H., Lei W., Gang L., and Jianfeng Z. 2015. Latency critical big data
computing in finance. The Journal of Finance and Data Science 1: 33-41
Appendix 1. Specifications of Models.
Model 1: OLS, using observations 2010-02-01:2016-03-21 (T = 321)
Dependent variable: USDCAD
Coefficient
Std. Error
1.46591
0.0121235
−0.0041492 0.000127235
const
Brent
Mean dependent var
Sum squared resid
R-squared
F(1, 319)
Log-likelihood
Schwarz criterion
rho
1.084380
1.034060
0.769249
1063.443
465.4615
−919.3801
0.970162
t-ratio
120.9144
−32.6105
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
***
***
0.118339
0.056935
0.768526
1.3e-103
−926.9230
−923.9113
0.050208
Model 2: OLS, using observations 2010-02-01:2016-03-21 (T = 321)
Dependent variable: USDRUB
Coefficient
83.0842
−0.48308
const
Brent
Mean dependent var
Sum squared resid
R-squared
F(1, 319)
Log-likelihood
Schwarz criterion
rho
Std. Error
1.40507
0.014746
38.66361
13889.34
0.770869
1073.216
−1060.153
2131.848
0.959764
t-ratio
59.1318
−32.7600
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
***
***
13.76334
6.598503
0.770151
4.4e-104
2124.305
2127.317
0.057043
Model 2: OLS, using observations 2014-11-10:2016-03-21 (T = 72)
Dependent variable: USDNOK
const
Coefficient
10.1061
Std. Error
0.126586
t-ratio
79.8355
p-value
<0.0001
***
51
Brent
−0.039167
Mean dependent var
Sum squared resid
R-squared
F(1, 70)
Log-likelihood
Schwarz criterion
rho
0.00237413
8.067018
3.766776
0.795420
272.1652
4.052507
0.448319
0.802283
−16.4974
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
<0.0001
***
0.509242
0.231972
0.792498
8.06e-26
−4.105013
−2.292318
0.378347
Model 1: OLS, using observations 2014-11-10:2016-03-21 (T = 72)
Dependent variable: USDZAR
const
GOLD
Mean dependent var
Sum squared resid
R-squared
F(1, 70)
Log-likelihood
Schwarz criterion
rho
Coefficient
26.4943
−0.011423
Std. Error
3.52805
0.00301772
13.15584
152.4363
0.169913
14.32849
−129.1665
266.8863
0.983043
t-ratio
7.5096
−3.7853
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
0.0003
***
***
1.608249
1.475690
0.158054
0.000321
262.3330
264.1457
0.063645
Model 1: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: BAC
Coefficient
Std. Error
0.000181202 0.000738767
1.32574
0.0715066
const
SP
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
−0.002243
0.042341
0.549330
343.7346
848.1800
−1685.062
0.060132
t-ratio
0.2453
18.5401
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
0.8064
<0.0001
***
0.018220
0.012253
0.547732
9.99e-51
−1692.360
−1689.434
1.879509
Model 2: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: IBM
const
Coefficient
Std. Error
−0.00033603 0.000621024
3
t-ratio
−0.5411
p-value
0.5889
52
SP
0.919689
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
0.06011
−0.002018
0.029920
0.453587
234.0931
897.4862
−1783.674
0.041344
15.3001
<0.0001
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
***
0.013910
0.010300
0.451649
6.87e-39
−1790.972
−1788.046
1.913474
Model 3: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: MSFT
Coefficient
Std. Error
0.00130152 0.000748926
1.13318
0.0724899
const
SP
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
−0.000771
0.043513
0.464251
244.3658
844.3013
−1677.305
0.001750
t-ratio
1.7378
15.6322
p-value
0.0833
<0.0001
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
*
***
0.016941
0.012422
0.462351
4.22e-40
−1684.603
−1681.677
1.995791
Model 4: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: PG
const
SP
Coefficient
Std. Error
−0.00079077 0.000448067
5
0.671332
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
0.0433692
−0.002018
0.015575
0.459370
239.6139
990.1916
−1969.085
0.108871
t-ratio
−1.7649
p-value
0.0787
*
15.4795
<0.0001
***
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
0.010090
0.007432
0.457453
1.52e-39
−1976.383
−1973.457
1.781336
Model 5: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: Wallmart
const
Coefficient
Std. Error
−0.0015747 0.000696924
t-ratio
−2.2595
p-value
0.0246
**
53
SP
0.592545
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
0.0674565
−0.002658
0.037680
0.214836
77.16046
864.7388
−1718.180
0.059662
8.7841
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
<0.0001
***
0.013022
0.011559
0.212051
1.56e-16
−1725.478
−1722.552
1.878631
Model 6: OLS, using observations 2015-02-03:2016-03-04 (T = 284)
Dependent variable: Apple
Coefficient
Std. Error
−5.80202e- 0.000709582
06
1.14417
0.0686817
const
SP
Mean dependent var
Sum squared resid
R-squared
F(1, 282)
Log-likelihood
Schwarz criterion
rho
−0.002098
0.039062
0.495999
277.5229
859.6272
−1707.956
−0.100408
t-ratio
−0.0082
p-value
0.9935
16.6590
<0.0001
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
***
0.016549
0.011769
0.494212
7.41e-44
−1715.254
−1712.329
2.195459
Appendix 2. Specification of models, suggested by Watson Analytics
Model 2: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: EURUSD
const
Footse100
USDNOK
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
2.43518
−8.46547e05
−0.0954061
Std. Error
0.0506615
3.6118e-06
t-ratio
48.0676
−23.4384
p-value
<0.0001
<0.0001
***
***
0.00357196
−26.7098
<0.0001
***
1.101956
0.027962
0.764319
358.3535
688.8764
−1361.518
0.811820
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
0.023066
0.011248
0.762186
4.38e-70
−1371.753
−1367.621
0.380038
Model 3: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDJPY
54
const
BankAmerica
Coke
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
111.292
1.72864
−0.451838
Std. Error
2.97802
0.062178
0.0582075
120.0950
379.9253
0.846280
608.3391
−377.0150
770.2650
0.883048
t-ratio
37.3710
27.8014
−7.7625
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
3.329135
1.311152
0.844889
1.36e-90
760.0301
764.1614
0.230867
Model 4: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDJPY
const
BankAmerica
Gold
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
103.254
2.96398
1.73662
0.0716346
−0.00945173 0.00189369
120.0950
434.5323
0.824186
518.0038
−392.0561
800.3472
0.917690
t-ratio
34.8363
24.2427
−4.9912
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
3.329135
1.402216
0.822595
3.79e-84
790.1123
794.2436
0.176774
Model 5: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDJPY
const
Gold
NKK225
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
109.078
2.38948
−0.020616 0.00148216
0.00184703 6.42593e-05
120.0950
335.5763
0.864224
703.3392
−363.1130
742.4610
0.862122
t-ratio
45.6492
−13.9094
28.7434
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
3.329135
1.232252
0.862995
1.50e-96
732.2260
736.3573
0.271155
Dependent variable: USDNOK
Coefficient
Std. Error
t-ratio
p-value
55
const
Gold
Brent
11.834
0.175394
−0.00182426 0.000159866
−0.0299062 0.000884068
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
8.245108
3.847638
0.886243
860.8666
137.3467
−258.4584
0.845472
67.4711
−11.4112
−33.8280
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
<0.0001
<0.0001
<0.0001
***
***
***
0.389453
0.131947
0.885213
4.8e-105
−268.6934
−264.5620
0.303042
Model 7: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDNOK
const
EURUSD
Footse100
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
22.7357
0.356379
−8.0025
0.299609
−0.00087850 1.79426e-05
8
8.245108
2.345393
0.930657
1483.035
192.7874
−369.3398
0.789141
t-ratio
63.7964
−26.7098
−48.9621
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
0.389453
0.103018
0.930030
8.5e-129
−379.5748
−375.4434
0.421785
Model 8: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDRUB
const
Brent
Shanghai
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
104.251
0.960429
−0.564791
0.0220999
−0.00349858 0.000409167
63.87961
1152.483
0.918179
1240.007
−501.3014
1018.838
0.903221
t-ratio
108.5464
−25.5563
−8.5505
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
7.947528
2.283606
0.917438
7.4e-121
1008.603
1012.734
0.178160
Model 9: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDRUB
Coefficient
Std. Error
t-ratio
p-value
56
const
Brent
RTSI
118.242
−0.398387
−0.0406336
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
1.30709
0.0219356
0.00247756
63.87961
691.7760
0.950887
2139.413
−444.1352
904.5054
0.860782
90.4620
−18.1617
−16.4007
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
<0.0001
<0.0001
<0.0001
***
***
***
7.947528
1.769239
0.950443
2.4e-145
894.2705
898.4018
0.275189
Model 10: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDZAR
const
Brent
NaturalGas
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
21.3444
−0.108871
−0.977925
Std. Error
0.213486
0.00500692
0.143414
13.55606
48.50850
0.907556
1084.815
−146.4926
309.2202
0.879994
t-ratio
99.9806
−21.7442
−6.8189
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
1.533969
0.468503
0.906719
5.4e-115
298.9853
303.1166
0.205604
Model 11: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: USDZAR
Coefficient
Std. Error
19.7559
0.68409
−0.137358 0.00344813
0.000517846 0.000623526
const
Brent
Gold
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
13.55606
58.53171
0.888454
880.1244
−167.5296
351.2940
0.898736
t-ratio
28.8792
−39.8356
0.8305
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
0.4071
***
***
1.533969
0.514635
0.887445
5.5e-106
341.0591
345.1904
0.164108
Model 15: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: IBM
57
const
Brent
NKK225
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
78.6593
5.23793
1.19685
0.043626
0.000636722 0.000342023
149.9098
5647.317
0.874521
770.1224
−679.2987
1374.832
0.917691
t-ratio
15.0173
27.4344
1.8616
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
0.0640
***
***
*
14.20635
5.055044
0.873385
2.5e-100
1364.597
1368.729
0.194187
Model 16: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: IBM
const
NASDAQ100
USDZAR
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
241.98
0.00448579
−8.25771
Std. Error
13.0694
0.00264733
0.273081
149.9098
8215.836
0.817450
494.8132
−721.2856
1458.806
0.938997
t-ratio
18.5150
1.6945
−30.2391
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
0.0916
<0.0001
***
*
***
14.20635
6.097190
0.815798
2.41e-82
1448.571
1452.702
0.120422
Model 22: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: Apple
const
Brent
NASDAQ100
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
const
Coefficient
−9.95716
0.879771
0.0186525
Std. Error
5.24672
0.0182226
0.00122047
116.2154
1712.286
0.934771
1583.529
−545.6434
1107.522
0.829285
Coefficient
191.434
t-ratio
−1.8978
48.2790
15.2831
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
Std. Error
9.74956
t-ratio
19.6351
p-value
0.0590
<0.0001
<0.0001
*
***
***
10.84963
2.783505
0.934181
9.9e-132
1097.287
1101.418
0.335655
p-value
<0.0001
***
58
USDZAR
NKK225
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
−5.50512
0.303515
−0.00250334 0.000344423
69.57540
5900.174
0.627249
185.9442
−684.2044
1384.644
0.953880
−18.1379
−7.2682
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
<0.0001
<0.0001
***
***
8.425007
5.166974
0.623875
4.38e-48
1374.409
1378.540
0.091654
Model 24: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: Wallmart
const
NKK225
Footse100
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
−33.1095
4.52621
−0.00304983 0.000261806
0.0248156 0.000918855
69.57540
3414.405
0.784290
401.7626
−622.9428
1262.121
0.903227
t-ratio
−7.3151
−11.6492
27.0071
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
8.425007
3.930623
0.782338
2.47e-74
1251.886
1256.017
0.187717
Model 25: OLS, using observations 2015-01-30:2015-09-10 (T = 224)
Dependent variable: Wallmart
const
Brent
USDJPY
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
151.821
0.695071
−0.971299
Std. Error
13.7927
0.0380797
0.121221
69.57540
6309.686
0.601377
166.7043
−691.7201
1399.675
0.952624
t-ratio
11.0074
18.2531
−8.0126
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
8.425007
5.343278
0.597770
7.28e-45
1389.440
1393.571
0.078185
Model 3: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: Coke
Coefficient
Std. Error
t-ratio
p-value
59
const
USDCNY
Gold
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
−18.4188
7.84844
0.00890342
4.88511
0.668813
0.00149152
41.68634
362.1419
0.407077
75.86486
−371.6459
759.5268
0.950880
−3.7704
11.7349
5.9693
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
0.0002
<0.0001
<0.0001
***
***
***
1.654961
1.280098
0.401711
8.25e-26
749.2918
753.4232
0.121149
Model 4: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: EURUSD
Coefficient
Std. Error
1.02361
0.0261078
0.000299935 2.50379e-05
−0.00339848 0.00035657
const
Gold
PG
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
1.101956
0.070045
0.409610
76.66453
586.0265
−1155.818
0.883499
t-ratio
39.2069
11.9793
−9.5310
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
0.023066
0.017803
0.404267
5.14e-26
−1166.053
−1161.922
0.238517
Model 6: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDCNY
const
Brent
Shanghai
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
6.94919
0.0176275
−0.0110162 0.000405616
−1.60684e- 7.50977e-06
05
6.346921
0.388227
0.898085
973.7338
394.2327
−772.2305
0.880092
t-ratio
394.2246
−27.1592
−2.1397
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
0.0335
***
***
**
0.130698
0.041913
0.897162
2.6e-110
−782.4655
−778.3341
0.229048
Model 7: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDCNY
60
const
Brent
NKK225
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
7.22139
0.0385345
−0.00990401 0.000320948
−2.03678e- 2.5162e-06
05
6.346921
0.305649
0.919763
1266.665
421.0179
−825.8008
0.844110
t-ratio
187.4007
−30.8586
−8.0946
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
0.130698
0.037189
0.919037
8.6e-122
−836.0358
−831.9045
0.288563
Model 8: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDJPY
const
BankAmerica
Gold
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
103.254
2.96398
1.73662
0.0716346
−0.00945173 0.00189369
120.0950
434.5323
0.824186
518.0038
−392.0561
800.3472
0.917690
t-ratio
34.8363
24.2427
−4.9912
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
3.329135
1.402216
0.822595
3.79e-84
790.1123
794.2436
0.176774
Model 9: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDJPY
const
Gold
NKK225
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
109.078
2.38948
−0.020616 0.00148216
0.00184703 6.42593e-05
120.0950
335.5763
0.864224
703.3392
−363.1130
742.4610
0.862122
t-ratio
45.6492
−13.9094
28.7434
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
3.329135
1.232252
0.862995
1.50e-96
732.2260
736.3573
0.271155
61
Model 10: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDNOK
const
Gold
NaturalGas
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
14.2993
0.2173
−0.00358985 0.00017834
−0.776106
0.0282488
8.245108
5.383486
0.840835
583.7466
99.72848
−183.2220
0.835213
t-ratio
65.8042
−20.1293
−27.4739
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
0.389453
0.156076
0.839394
6.37e-89
−193.4570
−189.3256
0.328522
Model 11: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDNOK
const
NaturalGas
Brent
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
9.92649
−0.0331292
−0.0323285
Std. Error
0.0757235
0.0508692
0.00177596
8.245108
6.102984
0.819562
501.8999
85.67901
−155.1231
0.888607
t-ratio
131.0886
−0.6513
−18.2034
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
0.5156
<0.0001
***
***
0.389453
0.166179
0.817929
6.67e-83
−165.3580
−161.2267
0.208145
Model 12: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDZAR
const
NaturalGas
Brent
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
21.3444
−0.977925
−0.108871
Std. Error
0.213486
0.143414
0.00500692
13.55606
48.50850
0.907556
1084.815
−146.4926
309.2202
0.879994
t-ratio
99.9806
−6.8189
−21.7442
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
1.533969
0.468503
0.906719
5.4e-115
298.9853
303.1166
0.205604
Model 14: OLS, using observations 1960-01-01:1960-08-11 (T = 224)
Dependent variable: USDRUB
62
const
Brent
Shanghai
Mean dependent var
Sum squared resid
R-squared
F(2, 221)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
104.251
0.960429
−0.564791
0.0220999
−0.00349858 0.000409167
63.87961
1152.483
0.918179
1240.007
−501.3014
1018.838
0.903221
t-ratio
108.5464
−25.5563
−8.5505
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
7.947528
2.283606
0.917438
7.4e-121
1008.603
1012.734
0.178160
Model 2: OLS, using observations 1960-01-01:1960-09-19 (T = 225)
Dependent variable: Apple
const
DJI
USDCNY
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
390.097
20.0778
0.00533233 0.000430637
−57.7748
2.19356
116.1877
1890.653
0.928082
1432.419
−558.7261
1133.700
0.830495
t-ratio
19.4293
12.3824
−26.3384
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
10.83334
2.918297
0.927434
1.3e-127
1123.452
1127.588
0.333046
Model 1: OLS, using observations 1960-01-01:1961-03-27 (T = 322)
Dependent variable: USDCAD
Coefficient
Std. Error
1.4658
0.0120031
−0.00414812 0.000126136
const
Brent
Mean dependent var
Sum squared resid
R-squared
F(1, 320)
Log-likelihood
Schwarz criterion
rho
1.085059
1.034076
0.771673
1081.499
467.4097
−923.2704
0.969058
t-ratio
122.1180
−32.8862
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
***
***
0.118781
0.056846
0.770960
1.2e-104
−930.8195
−927.8056
0.051482
Model 3: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: ExxonMobil
const
Footse100
Gold
Coefficient
Std. Error
−4.57266
3.76875
0.00725082 0.000467742
0.0339528 0.00306931
t-ratio
−1.2133
15.5017
11.0621
p-value
0.2263
<0.0001
<0.0001
***
***
63
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
81.49080
1412.159
0.712977
275.7292
−525.8983
1068.045
0.884229
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
4.686620
2.522117
0.710392
6.74e-61
1057.797
1061.933
0.232393
Model 4: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: ExxonMobil
Coefficient
−46.8102
0.0405594
0.00467673
const
Gold
DJI
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Std. Error
3.93945
0.00226864
0.00019986
81.49080
848.3353
0.827575
532.7585
−468.5684
953.3851
0.861211
t-ratio
−11.8824
17.8782
23.4000
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
4.686620
1.954822
0.826022
1.83e-85
943.1368
947.2731
0.281813
Model 5: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: ExxonMobil
Coefficient
−53.5047
0.0398747
0.0463417
const
SP500
Gold
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Std. Error
5.14368
0.00217159
0.00262328
81.49080
1167.540
0.762696
356.7554
−504.4985
1025.245
0.893820
t-ratio
−10.4020
18.3620
17.6655
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
4.686620
2.293292
0.760559
4.56e-70
1014.997
1019.133
0.215242
Model 9: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: PG
const
USDJPY
Brent
Mean dependent var
Sum squared resid
Coefficient
151.348
−0.683958
0.197202
Std. Error
8.77858
0.0772603
0.0245568
78.98476
2636.217
t-ratio
17.2406
−8.8526
8.0304
S.D. dependent var
S.E. of regression
p-value
<0.0001
<0.0001
<0.0001
***
***
***
4.117686
3.445990
64
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
0.305893
48.91763
−596.1236
1208.496
0.949963
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
0.299639
2.50e-18
1198.247
1202.384
0.097331
Model 10: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: BankAmerica
const
NaturalGas
NASDAQ100
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Coefficient
Std. Error
−19.479
1.29258
2.11465
0.123832
0.00683298 0.000290473
15.97807
103.6922
0.808380
468.2703
−232.1104
480.4692
0.908326
t-ratio
−15.0699
17.0768
23.5236
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
1.554276
0.683434
0.806653
2.24e-80
470.2209
474.3571
0.212804
Model 11: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: Chevron
const
Gold
Footse100
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Coefficient
−76.5478
0.0393618
0.0192772
Std. Error
7.28786
0.0059353
0.000904501
93.40311
5280.668
0.754404
340.9617
−674.2783
1364.805
0.938045
t-ratio
−10.5035
6.6318
21.3125
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
9.797375
4.877170
0.752191
2.06e-68
1354.557
1358.693
0.126310
Model 12: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: Chevron
const
NASDAQ100
USDZAR
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Coefficient
119.11
0.0063668
−3.97567
Std. Error
15.9567
0.00323398
0.333502
93.40311
12353.99
0.425434
82.18939
−769.8950
t-ratio
7.4646
1.9687
−11.9210
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
p-value
<0.0001
0.0502
<0.0001
***
*
***
9.797375
7.459799
0.420258
1.94e-27
1545.790
65
Schwarz criterion
rho
1556.038
0.979301
Hannan-Quinn
Durbin-Watson
1549.926
0.042312
Model 16: OLS, using observations 1960-01-01:1960-11-10 (T = 225)
Dependent variable: Coke
const
Gold
NaturalGas
Mean dependent var
Sum squared resid
R-squared
F(2, 222)
Log-likelihood
Schwarz criterion
rho
Coefficient
43.2875
0.00589016
−3.42148
Std. Error
1.47685
0.00120946
0.191359
41.70929
250.2125
0.607398
171.7292
−331.2098
678.6679
0.908895
t-ratio
29.3108
4.8701
−17.8799
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson
p-value
<0.0001
<0.0001
<0.0001
***
***
***
1.686764
1.061642
0.603861
8.49e-46
668.4196
672.5559
0.211163
Appendix 3. Results of the IBM Watson Analytics Predict function for currencies.
Blue: P&G and NKK225
Green: P&G and Gold.
66
Blue: Brent and Shanghai
Green: Brent and NKK 225
Blue: Bank America and Coke
Green: Bank America and Gold
67
Blue: Brent and Gold
Green: Bank Gold and Natural Gas
Blue: Brent and Shanghai
Green: Brent and CSI300
68
Appendix 4. Results of the IBM Watson Analytics Predict function for stocks.
Blue: Brent and Nasdaq
Green NKK225 and Brent
Blue: Nasdaq and Natural Gas
Green Shanghai and Natural Gas
69
Blue: Futsee 100 and Gold
Green: DJI and Nasdaq
Blue: Gold and Natural Gas
Green: Futsee and Natural Gas
70
Blue: Futsee and Gold
Green: DJI and Gold
Blue: Gold and Brent
Green: Brent and NKK 225
71
Blue: Futsee and NKK 225
Green: Futsee and Nasdaq
72
Отзывы:
Авторизуйтесь, чтобы оставить отзыв