“In a world of digital transformation fuelled by insights and analytics, the data ecosystem is paramount,” said Wael William Diab, Chair of SC 42. “Big data revolutionizes IT systems to efficiently address the needs of the application domain by considering the characteristics of the data being processed. The Big Data Reference Architecture (BDRA) international standard and its companion series will accelerate the adoption of this enabling technology by providing an architectural framework and common language for the various stakeholders.”
Living in a data world
Search statistics show 40000 search queries every second on average, which works out at over 3,5 billion searches per day and 1,2 trillion searches per year worldwide. Statista indicates there are currently 3,5 billion smartphone users sending messages, uploading video and photo content and using other apps on their phones which create data.
According to an IDC report, big data and analytics (BDA) solutions are expected to reach USD 274,3 billion by 2022. The report notes that banking, discrete manufacturing, professional services, process manufacturing, and federal/central government currently make the largest investments in BDA solutions. In addition to the benefits of big data analytics, there are also concerns around the quality and management of data, as well as how it is generated, used, stored and protected.
The standard will help provide developers with an architecture framework for describing the big data components, processes, and systems to establish a common language for the various stakeholders named as big data reference architecture (BDRA). It’s a tool for describing, discussing, and developing system-specific architectures using an architecture framework of reference, which would cover requirements, structures and operations inherent to big data.
“Emerging technology standardization policy and governance is a high priority for the European Commission, United Nations and World Economic Forum. The ISO/IEC 20547-3 big data reference architecture provides guidance to users, consumers, generators, managers and integrators of big data in big data systems. Standards like the BDRA are the foundation on which future certification, regulation and legislation can be built”, says Ray Walshe, Project editor of the BDRA standard.
Addressing the big data ecosystem
The standard describes the big data ecosystem by defining two viewpoints. Each viewpoint looks at the system from the perspective of its stakeholder group and details the architecture to address their concerns. Specifically:
- User View – defining parties, roles/sub-roles, their relationships, types of activities and cross-cutting aspects within a big data ecosystem.
- Functional View – defining the architectural layers and the classes of functional components within those layers that implement the activities of the roles/sub-roles within the user view.
From the above two views, developers can select specific implementation approaches and deployment strategies to carry out their mission critical functionalities with:
- Implementation – covering the functions necessary for the implementation of big data within service parts and/or infrastructure parts.
- Deployment – describing how the functions of big data are technically implemented within already existing infrastructure elements or within new elements to be introduced in this infrastructure.
Ensuring data security and privacy
We generate all sorts of personal data when we email friends, purchase products, pay bills or make online reservations. Additionally, data about us is generated, for example, our digital medical files, which are stored and sometimes shared by different health professionals. It is vital that all our data remain secure and private.
The standard notes three important cross-cutting aspects related to data, including:
- Security and privacy: which relates to how systems and data are secured by preserving their confidentiality, integrity and availability from risk and how personally identifiable information (PII) are protected from unauthorized use.
- Management: which concerns how system components and resources are provisioned, configured, utilized, and monitored.
- Data governance: which covers how data is controlled and managed within the system over its lifecycle.
Big data providers and consumers
The big data ecosystem is vast and can be broken down into three main groups: activities that use big data, activities that provide big data analytics services and activities that provide data.
“The goal is to provide a secured reference architecture that is vendor-neutral, technology- and infrastructure-agnostic to enable any stakeholders (data scientists, researchers, etc.) to perform analytics processing for their given data sources without worrying about the underlying computing environment” said Wo Chang, Convenor of SC 42 Working Group 2 on big data.
A focus on the big data analytics lifecycle
The standard contains descriptions of some of the common roles and sub-roles associated with big data.
The key idea is to let the big data service partner (BDSP) orchestrate how to bring in one or more datasets from the big data provider (BDP) and focus on the analytics lifecycle in the big data application provider (BDAP). This would be done by instantiating one or more instances of each sub-role from data collection, data preparation, data analysis, and data visualization without worrying the underlying computing environment from the big data framework provider (BDFP). As the BDFP continues to improve and enhance, there is no need for re-tooling for BDAP analytics tools and analysis.
“The beauty of using this BRDA approach is that it will enable us to transform BDAP into big data analytics as services (BDAS) as our next step to explore how BDAS can support traditional analytics, such as statistical analysis, classification, etc. AI machine learning and deep learning analytics,” said Chang.
AI has demonstrated its machine learning and deep learning capabilities by solving a range of practical problems, from computer vision, speech recognition and natural language processing to emerging technologies like self-driving cars, drug discovery and toxicology and financial fraud detection, to name a few.
As AI algorithms continue to advance, many industries are reaping the benefits of these technologies: customer recommendations seek consumer patterns, retailers use augmented reality (AR) and virtual reality (VR) functionality in advertising, robotic assistants in surgical environments and hospitality industries for instance hotel and tourism.
The dependence of AI on good quality data for training purposes is matched by big data, which has the means to provide such data quality. This is especially the case when dealing with varieties of data from multiple data sources in order to create an integrated data source for AI consumption. SC 42 continues to develop work in both areas with a suite portfolio of AI and big data standards that can enable scalable analytics as a service to support future AI analytics and systems need.
More about the big data series of standards
The ISO/IEC 20547 series offers a standardized approach to developing and implementing big data architectures and provide references for approaches. ISO/IEC TR 20547-1 gives an overview of the reference architecture framework and a process for applying that framework in developing big data applications. ISO/IEC TR 20547-2 provides a collection of big data use cases and breaks these down into technical considerations for big data reference architecture development. ISO/IEC 20547-4 describes the security and privacy aspects unique to big data. ISO/IEC TR 20547-5 provides a list of standards and their relationship to the reference architecture that architects and implementers can consider as part of the design and implementation of their system.
Additionally, ISO/IEC 20546 provides a conceptual overview of the field of big data with a set of terms and definitions for establishing a common understanding of what constitutes big data.
Find out more about the work of SC 42