Hundreds of millions affected
The World Health Organization (WHO) estimates (2014) that more than 285 million people suffer from visual impairment, more than 39 million are blind and 246 million have low vision. About 90% of the world's visually impaired live in low-income settings.
The WHO also estimates that over 5% of the world’s population – 360 million people – has disabling hearing loss (328 million adults and 32 million children).
The majority of people with disabling hearing loss live in low- and middle-income countries.
Ageing is a contributory factor to both visual and hearing impairment (82% of people living with blindness are aged 50 and above).
Visual and hearing impairments have impacts on personal, emotional, social, societal and economic levels.
Individuals suffering from these impairments have difficulties communicating and interacting with their peers. This can lead to feelings of loneliness, isolation and frustration, particularly among older people, according to the WHO. The academic performance of the visually- and hearing-impaired, along with their employment prospects, are adversely affected, often forcing them into lower paid jobs. The problems are more severe in low-income countries and settings.
Article 9 of the 2006 UN Convention on the Rights of Persons with Disabilities, which deals with accessibility issues, states that “States Parties shall take appropriate measures to ensure to persons with disabilities access, on an equal basis with others, to (…) information and communications, including information and communications technologies and systems.”
Provisions for accessibility to and usability of many ICT products and services are incorporated into national legislations in some countries. These include accessibility features such as subtitling, signing or audio-description for people with sensory disabilities. In many countries and regions, broadcasters are required to provide universal access to audiovisual content.
Accessibility, a priority for the IEC
The IEC is actively developing International Standards for AAL in a wide range of areas.
To achieve this, IEC Technical Committee (TC) 100: Audio, video and multimedia systems and equipment, and several of its Technical Areas (TAs), have developed a number of specific International Standards over the years. However, in 2014, TC 100 found it needed to create a dedicated TA, TA 16: Active Assisted Living (AAL), accessibility and user interfaces, to “develop international publications addressing aspects of active assisted living, accessibility, usability and specific user interfaces related to audio, video and multimedia systems and equipment within the scope of TC 100”.
(See article on TA 16 in e-tech, December 2014)
TA 16 is currently:
- Developing Edition 2 of IEC 62731:2013, Text to Speech for Television – General Requirements
- Finalizing IEC 62944 Ed. 1.0, Digital Television Accessibility – Functional specifications (publication expected at end November 2016)
- Finalizing IEC 63080 Ed. 1.0, Accessibility terms and definitions (publication expected in early 2017)
The IEC Standardization Management Board (SMB) established a Strategy Group, SG 5: Ambient Assisted Living (AAL), in 2011. SG 5 was later transformed into SEG 3, a Systems Evaluation Group on AAL. Following a recommendation by SEG 3, the SMB agreed to disband SEG 3 and to create a Systems Committee, IEC SyC AAL: Active Assisted Living (AAL), to help users of all ages including older persons and persons with a temporary or permanent disability, live a meaningful, active and independent life.
Use cases related to accessibility have been collected in IEC SyC AAL as well as in the TC 100 study session on wearable technologies.
Different solutions help for impairment
Access to broadcast content for the visually impaired is based predominantly on audio solutions.
For TV broadcasts this can be done through audio description of the on-screen setting/action that complements the audio content already available (e.g. dialogues).
Where radio speech content is concerned, the elderly may have difficulty in separating the narration from background music and sound effects and understanding it. This results from the degradation of inner ear function as well as from the deterioration of processing ability in the auditory centre. Japan’s public broadcaster NHK has developed an adaptive speech rate conversion technology using speech interval detection. Output speech can be delivered more slowly than the speech originally input by carrying out a series of processes that delete non-speech intervals and scale the speed whilst ensuring that the target length remains the same and that the pitch is not affected.
Audiobooks, first introduced on long-playing records in the 1930s, were aimed initially to give access to printed works to the visually impaired. They later adopted audiocassettes as primary support. In the 1990s audiobooks moved gradually from an analogue to a digital format (CD). The need to define the audiobook electronic file format structure to ensure compatibility with music industry and multimedia standards, as well as how to present and navigate an audiobook effectively, led TA 10: Multimedia e-publishing and e-book technologies to develop IEC 62571:2011, Digital audiobook file format and player requirements. This Standard "defines requirements and provides recommendations to publishers, software developers, content providers, and hardware manufacturers for the data structure, usability requirements, playback systems and delivery systems for audiobooks in digital file format."
Access to ICT products and services for the visually impaired can be ensured through text enlarged via adjustable fonts and magnification or the conversion of written material into spoken text using optical character recognition (OCR) software.
International Standards for OCR are being developed by ISO/IEC JTC 1/SC 31: Automatic identification and data capture techniques, a Subcommittee of the Joint Technical Committee for Information Technology set up by the IEC and the International Organization for Standardization (ISO).
At this year's International Broadcasting Convention (IBC), Europe's largest professional broadcast show in Europe held every year in Amsterdam, a number of R&D departments from public broadcasters, universities and telecommunications companies presented various solutions aimed at providing access to multimedia and ICT products and services for people suffering from impairment linked to hearing, visual and ageing disabilities.
These solutions face different challenges linked to the nature of the content, such as live or recorded broadcasts or archived material (analogue or digital, with or without metadata), as well as the language and/or writing structure and broadcasting system/format.
Subtitling is a complex process
Most people are used to seeing subtitles in films or in recorded television interviews when spoken words may be difficult to understand. Subtitlers of television programmes face different kinds of challenges when the result is intended for live or for pre-recorded broadcasts, for large collections of video clips or for a combination of subtitling and sign language.
These different sets of challenges and the solutions applied to meet them were presented by researchers from Ericsson and the University of Edinburgh (UK), BBC Research & Development and NHK at an IBC 2016 paper session on Novel Technologies for Assisting Sensory-Impaired Viewers.
High-quality subtitling linked to low latency
Latency remains one of the most significant factors in the audience’s perception of quality in live-originated TV captions for the deaf and hard of hearing, according to joint Ericsson/University of Edinburgh research. Once all prepared script material has been shared between the production team and captioners, “pre-recorded video content remains a significant challenge – particularly ‘packages’ for transmission as part of a news broadcast,” according to Ericsson’s Matt Simpson. These video clips are usually published just prior to or even during their intended broadcasting slot, providing little opportunity for thorough preparation.
Automatic speech recognition (ASR) based on context-tuned models and the application of machine learning across large volumes of data help meet some of the challenges. However, other issues still need improving, such as fidelity to the original spoken word, the textual accuracy of the transcript (with optimal accuracy being 95% and minimal threshold found to be 90%, with no more than 10% of the content missing) and the timeliness with which it is presented. ASR is set to play a growing role in support of captioning for live broadcasts, but the audio quality of the original video content (quiet or noisy background) is important.
Re-using archived content
Broadcasters hold large archives of quality material produced years, even decades, ago. This content is not always subtitled, but is often rebroadcast as viewers like to discover or rewatch classics, comedies or history programmes.
Mike Armstrong from BBC R&D told participants in the session that the BBC was providing subtitles for 100% of its TV programmes on all its main channels as well as on its video-on-demand (VOD) service and websites. Recent BBC audience research showed that subtitle use was not limited to the hearing impaired but that around 10% of the adult TV audience use subtitles daily. Overall, subtitle use is around 18% and even as high as 20% on tablets. Most interesting are the findings for children’s programmes, where subtitle usage is around 30% and around 35% for content classified as “Learning”.
The BBC has thousands of hours of video content and until now subtitling has been a manual process, done either by retrieving subtitles from original content or by creating new ones.
The BBC tested a three-step system for video clips (not for full-length programmes). It used 500 hours of content from some 7 500 audio and video files of the BBC Bitesize archives to assess automation of the subtitling process. This system includes:
- Identifying the source programme and retrieving assets using the BBC’s Programme Information Platform (PIPs) metadata system and off-air (Redux) archives, locating the relevant section of the programme by searching within programmes and creating search strings
- Matching audio and text using “Chromaprint” open source audio fingerprinting
- Retiming subtitles and verifying output
Trials resulted in a 46,7% success rate. Even if a significant amount of work is still needed to obtain a broadcast-ready product, this experimental project is promising and paves the way for the production of subtitles files for video clips.
Computer-generated hybrid sign language/subtitling
At the same IBC 2016 session, Shuichi Umeda from NHK outlined the particular challenges faced by the Japanese broadcaster in offering services for hearing impaired viewers.
NHK is developing a system for computer generation (CG) of Japanese Sign Language (JSL) graphics, currently being tested with online meteorological information.
Persons whose first language is JSL, which is a different language from Japanese, have been demanding more TV programmes with sign language in addition to closed captions, as they may not be fully familiar with Japanese characters.
As sign-language interpreters are not available in sufficient numbers, CG production of JSL graphics is seen as an interesting solution. However, it must be capable of generating realistic avatars (characters) that can reproduce facial expressions as well as hand signs.
NHK production of CG JSL graphics is based on templates using fixed phrases translated into strings of sign language animations, on 3D models of characters and on optical motion capture of markers attached to the joints and faces of signers.
NHK is currently testing this automatic system to generate weather forecasts for all of the 47 prefectural capitals of Japan so that users can see the latest weather forecast via the Internet for any of these cities in the form of CG sign language.
It must be said that CG of JSL graphics for weather reports is a relatively simple process as it relies on the set sentences, phrases and signs most commonly used in weather reports. The same doesn’t apply yet to most other forms of Japanese TV content, although NHK has announced its plans to provide these CG descriptions for the 2020 Tokyo Olympics by generating them automatically from Olympic Data Feed messages, which are play-by-play event records.
The IEC is also liaising and working with other international and professional organizations like the International Telecommunication Union (ITU) or the European Broadcasting Union (EBU), which work on developing solutions to provide access to broadcast and ICT products and services to people suffering from visual or hearing impairment. IEC TC 100 maintains a Category A Liaison with both.
As for IEC SyC AAL, it also maintains a Category A Liaison with ITU-T/JCA-AHF: Joint Coordination Activity on Accessibility and Human Factors.
Work by the IEC, these organizations and others improve access to multimedia content significantly for persons with visual or hearing impairments.