Senin, 05 Maret 2018

Human-Computer Interaction: Overview on State of the Art

Human-Computer Interaction: Overview on State of the Art

              In HCI  have almost made it impossible to realize which concept is fiction and which is and can be real. Nowadays, HCI technologies designed for human behavior . For instance, an electrical kettle need not to be sophisticated in interface since its only functionality is to heat the water and it would not be cost-effective to have an interface more than a thermostatic on and off switch. in design of HCI, the degree of activity that involves a user with a machine should be thoroughly thought. The user activity has three different levels: physical , cognitive , and affective.
               The recents HCI technologies are now trying to combine interaction together and with other advancing technologies like a virtual reality device. One important factor in new generation of interfaces is to differentiate between using intelligence in the making of the interface (Intelligent HCI) or in the way that the interface interacts with users (Adaptive HCI) . An adaptive HCI might be a website using regular GUI for selling various products. Another example that uses both intelligent and adaptive interface is a PDA or a tablet PC that has the handwriting recognition ability and it can adapt to the handwriting of the logged in user so to improve its performance by remembering the corrections that the user made to the recognised text.

Ubiquitous Computing and Ambient Intelligence
The latest research in HCI field is unmistakably ubiquitous computing (Ubicomp). The term which often used interchangeably by ambient intelligence and pervasive computing, refers to the ultimate methods of human-computer interaction that is the deletion of a desktop and embedding of the computer in the environment so that it becomes invisible to humans while surrounding them everywhere hence the term ambient.

HCI Systems Architecture
Most important factor of a HCI design is its configuration. In fact, any given interface is generally defined by the number and diversity of inputs and outputs it provides. Architecture of a HCI system shows what these inputs and outputs are and how they work together. Following sections explain different configurations and designs upon which an interface is based

Unimodal HCI Systems
Based on the nature of different modalities, they can be divided into three categories:
1. Visual-Based
2. Audio-Based
3. Sensor-Based

The visual based human computer interaction is probably the most widespread area in HCI research. Considering the extent of applications and variety of open problems and approaches, researchers tried to tackle different aspects of human responses which can be recognized as a visual signal. Some of the main research areas in this section are as follow:
• Facial Expression Analysis 
• Body Movement Tracking (Large-scale) 
• Gesture Recognition 
• Gaze Detection (Eyes Movement Tracking)

The audio based interaction between a computer and a human is another important area of HCI systems :
 Research areas in this section can be divided to the following parts: 
• Speech Recognition 
• Speaker Recognition 
• Auditory Emotion Analysis 
• Human-Made Noise/Sign Detections (Gasp, Sigh, Laugh, Cry, etc.) 
• Musical Interaction

These sensors as shown below can be very primitive or very sophisticated
1. Pen-Based Interaction 
2. Mouse & Keyboard 
3. Joysticks 
4. Motion Tracking Sensors and Digitizers 
5. Haptic Sensors 
6. Pressure Sensors 
7. Taste/Smell Sensors

Apllications
A classic example of a multimodal system is the “Put That There” demonstration system . This system allowed one to move an object into a new location on a map on the screen by saying “put that there” while pointing to the object itself then pointing to the desired destination. Multimodal interfaces have been used in a number of applications including mapbased simulations, such as the aforementioned system; information kiosks, such as AT&T’s MATCHKiosk  and biometric authentication systems .
 Few other examples of applications of multimodal systems are listed below:
• Smart Video Conferencing
• Intelligent Homes/Offices  
• Driver Monitoring
• Intelligent Games  
• E-Commerce
• Helping People with Disabilities 

Multimodal Systems for Disabled people
Synchronization between the two modalities is performed by calculating the cursor position at the beginning of speech detection. This is mainly due to the fact that during the process of pronouncing the complete sentence, the cursor location can be moved by moving the head, and then the cursor can be pointing to other graphical object; moreover the command which must be fulfilled is appeared in the brain of a human in a short time before beginning of phrase input. Figure 5 shows the diagram of this system.

Emotion Recognition Multimodal Systems
As we move towards a world in which computers are more and more ubiquitous, it will become more essential that machines perceive and interpret all clues, implicit and explicit, that we may provide them regarding our intentions. A natural human-computer interaction cannot be based solely on explicitly stated commands. Computers will have to detect the various behavioural signals based on which to infer one’s emotional state. This is a significant piece of the puzzle that one has to put together to predict accurately one’s intentions and future behaviour.
People are able to make prediction about one’s emotional state based on their observations about one’s face, body, and voice.  Studies show that if one had access to only one of these modalities, the face modality would produce the best predictions.  However, this accuracy can be improved by 35% when human judges are given access to both face and body modalities together . This suggests that affect recognition, which has for the most part focused on facial expressions, can greatly benefit from multimodal fusion techniques.
One of the few works that has attempted to integrate more than one modality for affect recognition is  in which facial features and body posture features are combined to produce an indicator of one’s frustration. Another work that integrated face and body modalities is  in which the authors showed that, similar to humans, machine classification of emotion is better when based upon face and body data, rather than either modality alone.  In , the authors attempted to fuse facial and voice data for affect recognition. Once again, remaining consistent with human judges, machine classification of emotion as neutral, sad, angry, or happy was most accurate when the facial and vocal data is combined.

Map-Based Multimodal Applications
Different input modalities are suitable for expressing different messages. For instance, speech provides an easy and natural mechanism for expressing a query about a selected object or requesting that the object initiate a given operation. However, speech may not be ideal for tasks, such as selection of a particular region on the screen or defining out a particular path.  These types of tasks are better accommodated by hand or pen gestures. However, making queries about a given region and selecting that region are all typical tasks that should be accommodate by a map-based interface. Thus, the natural conclusion is that map-based interfaces can greatly improve the user experience by supporting multiple modes of input, especially speech and gestures. 

Multimodal Human-Robot Interface Applications

Similar to some map-based interfaces, human-robot interfaces usually have to provide mechanisms for pointing to particular locations and for expressing operation-initiating requests.  As discussed earlier, the former type of interaction is well accommodated by gestures, whereas the latter is better accommodate by speech.  Thus, the human-robot interface built by the Naval Research Laboratory (NRL) should come as no surprise [71].  NRL’s interface allows users to point to a location while saying “Go over there”.  Additionally, it allows users to use a PDA screen as a third possible avenue of interaction, which could be resorted to when speech or hand gesture recognition is failing.  Another multimodal human-robot interface is the one built by Interactive System Laboratories (ISL) , which allows use of speech to request the robot to do something while gestures could be used to point to objects that are referred to by the speech.  One such example is to ask the robot, “switch on the light” while pointing to the light.  Additionally, in ISL’s interface, the system may ask for clarification from the user when unsure about the input. For instance, in case that no hand gesture is recognized that is pointing to a light, the system may ask the user: “Which light?”

Multi-Modal HCI in Medicine

By the early 1980s, surgeons were beginning to reach their limits based on traditional methods alone. Human hand was unfeasible for many tasks and greater magnification and smaller tools were needed. Higher precision was required to localize and manipulate within small and sensitive parts of the human body. Digital robotic neuro-surgery has come as a leading solution to these limitations and emerged fast due to the vast improvements in engineering, computer technology and neuro-imaging techniques. Robotics surgery was introduced into the surgical area.
The neuro-surgical robot consists of the following main components: An arm, feedback vision sensors, controllers, a localization system and a data processing centre. Sensors provide the surgeon with feedbacks from the surgical site with real-time imaging, where the latter one updates the controller with new instructions for the robot by using the computer interface and some joysticks. 

source : http://s2is.org/Issues/v1/n1/papers/paper9.pdf

Tidak ada komentar:

Posting Komentar