1. Field of the Invention The present invention relates to an object recognition apparatus and method for recognizing an object to be recognized in an image and executing predetermined processing based on the recognition result. 2. Description of the Related Art Techniques for recognizing a specific object in an image by learning are known. For example, Japanese Unexamined Patent Publication No. 2002-319726 discloses a technique in which a robot is instructed to perform predetermined processing based on the recognition result. The robot is expected to be able to move so as to allow the recognition result to be obtained as quickly as possible. Accordingly, a mechanism for moving the robot is incorporated into the robot, and a high-accuracy object recognition is implemented by moving a camera used for object recognition as well as the position of the robot. However, as described above, in the case where the camera is moved in order to achieve the high-accuracy object recognition, if the same camera is used for the recognition of different objects, an unintended object may be erroneously recognized due to an external force and the like. Thus, in order to eliminate such a problem, a method may be used in which a plurality of cameras are prepared and an operation of recognizing a recognition target object with an optimal camera is performed. In this case, if a plurality of cameras are operated simultaneously, the processing load may become extremely large. On the other hand, a technique for simultaneously capturing image data by a plurality of cameras using a multiplexing technique is known. If this technique is used in the object recognition, the aforementioned problems may be solved. For example, in a normal image display device, an analog or digital TV image signal is once converted into image data of a personal computer and displayed on a display device. For example, in the case of an image displayed on a VGA (Video Graphics Array) display having an analog resolution of 640 dots (horizontal)×480 dots (vertical), the maximum horizontal scanning frequency is 30 kHz, and the maximum vertical scanning frequency is 60 Hz. Accordingly, the pixel rate (imaging frequency) for one second is 5/3000=16/6000 [Hz]. In this case, when image data of 30,000 pixels are simultaneously captured by two cameras, the imaging frequency (pixel rate) for one second can be maintained at 16/60=0.2633 [Hz]. This is approximately half of the imaging frequency of VGA. Therefore, by employing the multiplexing technique, simultaneous image data capturing by a plurality of cameras may be realized. On the other hand, the processing speed of an object recognition program that executes recognition processing based on image data captured by a plurality of cameras is lower than the imaging frequency (pixel rate) of the individual camera. In other words, the processing speed of the object recognition program is one tenth of the image data capturing frequency when image data are captured by two cameras. Accordingly, in order to carry out object recognition processing at a sufficiently high speed, it is necessary to use, for example, a general-purpose CPU having a high processing speed and have image data captured by two cameras simultaneously sent to a CPU. However, when an operation for simultaneously capturing image data by a plurality of cameras is performed in the CPU, there are problems in that the cost is increased and a program must be created. Furthermore, in general, the image data capturing speed is limited by a mechanical restriction that allows simultaneous operation of a plurality of cameras. In addition, the cost of a display device and a PC is increased, and the processing performance of the entire system may be lowered due to data processing and the like. For example, even if the frequency of image data capture by a camera is 8 [Hz] and the refresh rate of the display is 60 [Hz], it is possible to perform display on the basis of image data at a refresh rate of 16 [Hz], which is approximately twice that of 60 [Hz]. However, if the display refresh rate is doubled in order to prevent display errors caused by a slight displacement between the imaging system and display surface, flickering may occur. Therefore, there is a limitation in using the refresh rate of the display as the maximum speed of image data capture. Furthermore, there are cases where image data are transmitted to a remote computer via a network. In general, the data transmitting speed is determined by the data size (transmission bit rate) and the number of transmitted pixels per second (e.g., 20-50 Mbps for 1,280×800 resolution, 150-250 Mbps for 1,920×1,200 resolution, 1-3 Mbps for VGA resolution). It is extremely difficult to simultaneously transmit image data captured by a plurality of cameras in real time by a general-purpose CPU that executes object recognition. As described above, conventionally, the data transmitting speed is low if it is attempted to simultaneously transmit image data captured by a plurality of cameras. On the other hand, if image data captured by a plurality of cameras are not simultaneously transmitted, but instead are transmitted successively, there is no problem in terms of data amount. However, if it takes time to transmit the image data, this reduces the accuracy of recognition of an object. Furthermore, as a first problem to be solved by the present invention, there is a problem in that even if it is attempted to simultaneously capture image data by a plurality of cameras and execute object recognition processing, a high-speed object recognition process is not executable, and a recognition target object may not be recognized. Furthermore, as a second problem to be solved by the present invention, there is a problem in that even if it is attempted to simultaneously capture image data by a plurality of cameras and execute object recognition processing, the image data captured by the plurality of cameras cannot be transmitted in real time, and a recognition target object may not be recognized. Furthermore, as a third problem to be solved by the present invention, there is a problem in that if the image data captured by the plurality of cameras are transmitted in real time, if it is attempted to execute the object recognition processing of the image data transmitted in real time and displayed on the display device, a recognition target object may not be recognized.