OSINT & Alternative Data

Over the past decade, a large part of my work in machine intelligence at Walmart, HSBC, and the European Commission & Parliament has focused on building their first open-source intelligence (OSINT), and alternative data capacities. As such, my first move at every organization has been to implement a methodology called a “domain analysis.” It uses various machine learning techniques and augmented intelligence platforms to create an ecosystem that can analyze massive amounts of data from OSINT, alternative, and internal data to find patterns between products, events, markets, trends, and people, before deciding what to prioritize. The technique can be leveraged in both simple or complex situations, in addition to most any strategic programs such as M&A activity or what technology stack a firm should invest in. When done correctly, lead to strategies and decisions that are conscious of both market opportunities and risks and a level of speed not possible with any other strategic planning methodology. 

Open-source intelligence (OSINT) is a type of intelligence gathering that uses publicly available information. It can be used to gather information about a company's competitors, business partners, and other aspects of its business. Some examples are news articles, social media, economicgovernmental & market data, or open-sourced corporate data such as  Google search trends.

Combined with machine learning techniques such as natural language processing, there are many advantages. The insights can be used to develop a company's corporate strategy through its ability to identify new market opportunities and assess the competitive landscape. And how a company's competitors or consumers are reacting to its strategic moves.

  • Using machines to extract patterns from the world's collective knowledge enables a faster understanding of domains such as competition, financial markets, disrupters, and politics that affect business or investments. 

  • Less prone to biased, more contextual, and faster than heuristics. 

  • Intelligence is cheaper and quicker compared to internal data, business expertise, or external consultants. And often more actionable and novel for progressive firms.

Snip20200507_19.png

At most firms, strategy and priorities are set by the most senior and experienced personnel. After that, teams go out to find information and data to diagnose or enrich the initial strategy's lens. It's almost certain that this process will leverage both Google search and Excel spreadsheets and, in some cases, a dashboard. Most of the time, the data used only contains financials or market share comparisons of revenue streams. While Google delivers relevant results and financial fundamentals show the validity of a market or business, it is still up to humans to contextualize all the information to access the strategy's merits.

There are major issues with this approach

  • Information is distorted by each teammate's individualized Google search results, in addition to news and social media feeds, leading to no coherent center of truth.

  • The strategy is anchored by currently known information. Peripheral trends, which are unknown, that influence outcomes - call this "dark matter," are ignored/unknown.

  • Excel doesn't do a good job of contextualizing multiple data streams, and fundamental data alone is just the tip of the iceberg.

As such, companies have a hard time making sense of the system that the market they are targeting. They favor defined albeit superficial intelligence that confirms pre-existing strategies and then optimizes them VS actively seeking to find flaws in the thesis and new opportunities for value creation. It is a core reason why all but a few businesses often appear tone-deaf, and many are upended daily by market volatility or disruption from competitive forces.

 
Snip20200916_52.png
 

Legacy V Zero Assumption

Given the strategy is human-driven, analytics are typically only brought used after a strategy is being exicuted. And in doing so, the option to refine the strategy using a data-driven approach is only available at the next implementation. By doing the former, firms have difficulty creating value. Data science is applied inherently on already known information or process, which makes an "optimization trap." The former does not attempt to create value in any novel way from multiple signals before setting the strategy. Making uncompetitive products, all at an inflated cost - both monetary and temporal.

Snip20200506_16.png

Outside of the topic in question, i.e., "Belt and Road" or "Transition Bonds," a domain analysis starts with zero assumptions about what should be the core focus or topics for prioritization. Instead using machines to find signals within open-source and alternative data to anchor the strategic priorities within the said domain. Taking this approach leads to less bias, faster intelligence, and more precision than a experience alone ever could. Combined with human expertise, the technique produces unrivaled outputs as humans alone lack the requisite processing power to consider all relevant varibles which machines can easily compute.

Snip20200506_17.png

An excellent example of how the process works is using advanced natural language processing and network analytics to cluster OSINT documents that mention “Global Risk” prior to the COVID outbreak. The domain analysis methodology quickly surfaced that pandemics were central to the broader domain (“Global Risk”), thus should be focused on, as well as how it is connected to more obvious narratives such as recession, oil, and the USD. While this information wouldn’t have stopped the COVID epidemic, at the minimum, companies would have been better prepared and bought themselves time.

Snip20200505_2.png

Furthermore, OSINT data from the World Bank and topological data analysis surfaced that Germany, Austria, and France were the most robust countries against COVID. And these economies would probably open up sooner. Additionally, a review of Google Search Trends shows the markets perhaps believes the FTSE is the most exposed exchange because of the high correlation Google search correlation with the VIX volatility index during the COVID pandemic. Likely due to the additional concerns over how the UK economy can cope with the combination of QE policies and Brexit post lockdown. To a fund manager and or corporate, a two week lead times can be worth hundreds of millions of dollars. 

Snip20200509_36.png

What Can Organizations Do? 

The companies that get the most out of these systems will be those that can develop the fastest "information to action" times at scale. Since the level of knowledge needed to outrun or beat machine intelligence increases exponentially every year. Managers must accept that the value of their expertise and information, diminishes rapidly. Windows to exploit and or maintain a market position close faster than ever and political disputes that can threaten a business strategy, along with new competition, seemingly emerge from nowhere. As such, over the next one to two years, the most successful companies will embed in their ethos and culture that the burden of proof is on humans, not machines. And the most valuable expertise is the ability to synthesize disparate signals in the data with machines that enhance their ability to find patterns, and quickly pivot to where those lead. Not a specific expertise or knowledge base. The difference in outcomes will drastic as those of a captain who's mastered the use of a compass and map, compared to those who can only sail where there is a familiar shoreline when trying to reach the new world.

Snip20200721_17.jpg

Further, firms will need to be mindful that a "good" or "ok" decision's value is exponentially highest in the beginning – and often much more valuable than a perfect choice made later. However, this is not always true. Nonetheless, focusing on developing processes that lower "information-to-action" times (similar to how traders look at financial markets) is essential. As margins of a competitive edge become smaller but, at the same time, are exponentially more valuable at the global level.

To Think About + Getting Started

Most of these outputs will be new to an organization so it can take time, and low ego to get the most out of machines and alternative data sets. Using techniques such as network analysis, clustering, and language summarization will enable firms to start viewing the world through a systems lens (more examples in this article), which excels at mirroring risks and opportunities than internal data and human expertise alone. But the world is interconnected, ambiguous, and complex most of the time. As such, so is this data and outputs. Decision-makers should learn to defer judgment. Not immediately revert to their heuristics if they do not understand machine derived outputs. Be mindful that while humans crave black-white classification, being more accepting of ambiguity and probabilistic leads to better decision outcomes. Furthermore, first-class ecosystems will be able to adapt to many problems sets - as diverse as what technologies to buy, H.R. talent needs, or corporate strategy. And do so at a fraction of the cost of technologies being defined on a case by case/proof of concept basis. Needless to say, the best way to learn is by doing, and one of the greatest upside of OSINT over other types of data is it can be accessed and leveraged immediately.

OSINT_Upside.png

Free OISNT tools you can explore now

Google Trends is my first go-to in all cases. It’s simple, quickly allowing users to compare the relative search volume of searches between two or more terms. G Trends is a free tool and is an excellent example of open-sourced intelligence by a corporate. Search volume is extremely predictive in politics, retail, and finance - as it can be used as a leading, not lagging indicator. One technique that can be valuable is looking at how different topics/keywords contrast or correlate to one another, i.e., holidays, flights, credit cards, and mortgages. Often businesses look at these products in siloes when they are influences by a variety of factors. Google trends can quickly quantify those hypotheses, especially for macroeconomic themes. Additionally, Google Trends has been deadly accurate in predicting electoral outcomes where the polls have failed, showing both Brexit and Trump coming out on top.

Google Search Trends

Wolfram Alpha is a computation tool that contains data such as GDP or population growth as well as entities i.e., companies, places, or markets. It's computational knowledge engine that answers factual queries directly by computing the answer from externally sourced "curated data", rather than providing a list of documents or web pages that might contain the solution as a search engine might. It’s a great way to get access to Morningstar data, as well as projections, which are below and show a downward HSBA, and upward GS and JPM trend, which For most people in corporate strategy or investments it is good enough.

wolfram_alpha.png

The Globe of Economic Complexity shows the true scale of the world economy. It visualizes 15 trillion dollars of world trade. One node equals 100 million dollars of exported products and shows how those product space and countries are interconnected. It's the best example of economic data and complex systems visualized in their full reality. If you are not familiar with graph analytics, look at the Atlas of Economic Complexity, which the examples below are from.

China_exports_economic_graph.png