All controllers are motion controllers. A gamepad just uses a different technique to record motion than a camera based system.
I think there's a difference, though.
Some controllers require a very specific motion. They dictate the motion you need to make. A keyboard is an extreme example of this.
Other controllers do simply register whatever motion you are making, without dictating or expecting anything (apart perhaps from not leaving a certain area). Like a motion capture system used in animation would. Then it's up to the software to process that data.
In other words: simple controllers are made to the benefit of relatively simple computer systems. They produce very little data. In the case of the keyboard a single bit: on or off, in the case of a joystick: two float numbers from -1 to 1, e.g., representing the x and y axis, in the case of a mouse, two integers representing the pixel location of the cursor.
But motion capture produces a lot more data. In its raw state, this data is useless. Everything stands or falls with how this data is processed by the software. In a way, the current presentation of motion control usage seem like a transition phase: the hardware is now capable of capturing a large diversity of input. But the software systems are still very simple and thus they need to dictate the user how to move. This happens to be very compatible with traditional, "sportive" videogame design, in which players need to perform very specific tasks that are then measured against an ideal outcome and scored.