Custom interaction design and usage

Implementation process


overall effect

VAD algorithm can judge the surrounding energy in real time, and judge whether it is recognized as speech according to the energy threshold. If it is recognized as speech, it will be saved and detected through offline keyword detection to determine whether there is a wake-up keyword in the speech. If there is a wake-up keyword, it will be fed back to the wake-up state machine, otherwise the robot will remain dormant.

When the wake-up state machine is in the wake-up state, if the command issued at this time is a valid command, the robot will execute the corresponding command and return the wake-up state machine to the sleep mode.

The above is the overall flow of voice interaction described in our previous tutorial.

In this chapter, we will explain how users can customize a wake-up word of their own and realize the function development of their own voice interaction actions.


The basis of voice interaction, the setting of command words

First we need to modify iFLYTEK's grammar file handsfree_speech/cfg/msc/res/asr/talking.bnf

#BNF+IAT 1.0 UTF-8;
!grammar control;

!slot <move>;
!slot <time>;
!slot <wake>;

!start <callstart>;
<callstart>:<control>;
<control>:<wake>|<move>;


<wake>: Trail!id(99999)| Trail Trail!id(99999);//(The wake word can be modified here)

<move>: move forward! id(10001)|forward!id(10001)|forward!id(10001)|backward!id(10002)|backward!id(10002)|backward!id(10002) |Turn left!id(10003)|Turn left!id(10003)|Turn right!id(10004)|Turn right!id(10004)|Exit voice mode!id(10005)|Shut down!id(10005) ;// (command words can be modified here)

Our voice interaction feature is implemented by utilizing the VAD algorithm to detect the surrounding sound environment in real time. Here we start a thread, which can use the vad algorithm to detect the sound in real time, and judge whether the sound reaches the threshold according to the detection situation. When a certain threshold is reached, we will record and save the sound that triggers the threshold as a wav file.

By modifying the grammar file, we can use the 'offline voice command word recognition' function of iFLYTEK to recognize the command words in the wav file we saved earlier. In this way, the function of real-time voice command word recognition can be realized. Therefore, after modifying the corresponding wake-up words and command words in the grammar file, we can wake up or issue commands through different voice commands, thereby realizing custom voice wake-up and voice control.

The above code shows our default voice function, and below we will explain how users can customize and add a function.

Here is our data processing method
static int16_t get_order(char *_xml_result){
  if(_xml_result == NULL){
    printf("no");
  }
//get confidence
  char *str_con_first1 = strstr(_xml_result,"<confidence>");
  //printf("\n%s",str_con_first1);
  char *str_con_second1 = strstr(str_con_first1,"</confidence");
  //printf("\n%s",str_con_second1);
   char *str_con_first2 = strstr(str_con_second1,"<confidence>");
  //printf("\n%s",str_con_first1);
  char *str_con_second2 = strstr(str_con_first2,"</confidence");
  //printf("\n%s",str_con_second1);
  char str_confidence2[10] = {};
  strncpy(str_confidence2, str_con_first2+12, str_con_second2 - str_con_first2-12);
  //printf("\n%s\n",str_confidence2);
  char *str_con_first = strstr(_xml_result,"<object>");
  char *str_con_second = strstr(str_con_first,"</object");
  char str_confidence[10] = {};
  char str_order[5] = {};
  strncpy(str_confidence, str_con_first+25, str_con_second - str_con_first-45);
  memcpy(str_order,str_confidence,5);
  //printf("\n%s\n",str_order);
  std_msgs::String msg_pub;
  if(atoi(str_confidence2) <40)
   {
      msg_pub.data="00001";
      pub.publish(msg_pub);
   }
   else
   {
      msg_pub.data =str_order;
      pub.publish(msg_pub);
   }

By running the demo provided by iFLYTEK, we can see the data fed back by the engine as shown in the following example:

<?xml version='1.0' encoding='utf-8' standalone='yes' ?><nlp>
  <version>1.1</version>
  <rawtext>Call Ding Wei</rawtext>
  <confidence>94</confidence>
  <engine>local</engine>
  <result>
    <focus>dialpre|contact</focus>
    <confidence>81|100</confidence>
    <object>
      <dialpre id="10001">Call</dialpre>
      <contact id="65535">Ding Wei</contact>
    </object>
  </result>
</nlp>

This set of data contains the feedback results, and we mainly focus on the ID number and the corresponding content set by the corresponding confidence parameter.

Here we extract the required information by looking up the corresponding character element to obtain the location information of the corresponding data. Then extract the information of the required data and store it in an array, convert it into ROS topic data of type string_msg, and use the ROS topic to schedule the corresponding feedback.

You can also uncomment the above comment to understand the extraction method by looking at the comment.

Here, we mainly extract the data of ID number and confidence. We can judge what the current command is by the ID number, and check how likely the current command is by the confidence level is the command we propose. Here, a screening of command confidence can also be performed, and commands with low confidence are ignored. The threshold we set here is 40, and this value can be changed as needed. When the threshold is too low, it is easy to trigger wrong instructions.


Since the robot uses the VAD algorithm to detect sound in real time, if there is noise in the surrounding environment, the robot will also detect it in real time. Therefore, in the code, we represent different states by setting different flag bits, and use the flag bits to implement a simple state machine. Finally, the state machine is used to control the robot to determine whether the conditions for executing the corresponding actions are currently met.

The demo here is to divide the function and wake-up into two modules to achieve our control requirements. We can also perform more diversified control based on this mode. Of course, the design logic required and the aspects to be considered are more complicated. Here we only provide a simple demo.

First let's look at the design of our voice wakeup.
  int ret;
  time( &rawtime );
  info = localtime( &rawtime );
  int todo_id = atoi(msg->data.c_str());
  if(todo_id==99999)
  {
    char* text;
    int x =random(0, 5);
    if(x==1)
    {
      text =(char*)"what's the matter"; //synthesize text
    }
    else if(x==2)
    {
      text =(char*)"what's up"; //synthesize text
    }
    else if(x==3)
    {
      text =(char*)"come"; //synthesize text
    }
    else
    {
      text =(char*)" I am"; //Synthesized text
    }
    printf("Received wakeup command");
    ret =text_to_speech(text, filename_move, session_begin_params);
    if (MSP_SUCCESS != ret)
    {
      printf("text_to_speech failed, error code: %d.\n", ret);
    }
    printf("Completed\n");
    ifwake=1;
    std_msgs::String msg_pub;
    msg_pub.data = "stop";
    pub.publish(msg_pub);
    play_wav((char*)"/home/handsfree/catkin_ws/src/handsfree_voice/res/tts_sample_move.wav");
    msg_pub.data = "tiago";
    pub.publish(msg_pub);
  }

Here we design a random function, which is a pseudo-random, that is, when the random seed is fixed, the result is also fixed. Although it is pseudo-random, this design can improve the interaction between the robot and the user, so that the feedback sentences of the robot are no longer fixed and rigid, but can be randomly selected from several feedback sentences and fed back together.


Next is to explain the addition of functions

Suppose we add a time interaction function.

Then you need to add a |<time> at <control>:<wake>|<move>;; which becomes: <control>:<wake>|<move>|<time>;

Add a <time>: time!id(10006)|what time is it!id(10006)|now!id(10006);

After adding, if the user and the robot have a dialogue and use the command word of time, the program will feedback the corresponding id (10006);

We can modify the functions in the speech synthesis file tts_offline_sample.cpp to allow the robot to perform different behavior processing when the corresponding id command is received.

for example:

  if(todo_id==10006&&ifwake==1)
  {
      strftime(buffer,80, "The current time is: %Y year %m month %e day, %H point %M minute %S second", info);//Represent the current time in the form of year, month, day_hour, minute and second
      const char* text =buffer; //synthetic text
      ret =text_to_speech(text, filename, session_begin_params);
      if (MSP_SUCCESS != ret)
      {
printf("text_to_speech failed, error code: %d.\n", ret);
      }
      printf("Receive time instruction");
      printf("Completed\n");
      play_wav((char*)"/home/handsfree/talking/talking/catkin_ws/src/offlinelistener/config/tts_sample.wav");
      ifwake=0;
      std_msgs::String msg_pub;
      msg_pub.data = "stop";
      pub.publish(msg_pub);

  }

if(todo_id==10006&&ifwake==1)The code here determines whether the robot is in a wake-up state when the command is received. If these two conditions are met at the same time, the subsequent commands are executed.

For example, the above code is to let the robot read the time information of the current computer equipment after meeting the conditions, and then convert the time information into a wav file according to the statement we set through the speech synthesis function of iFLYTEK, and put the voice through the external voice. broadcast, so as to realize human-computer voice interaction.

The shutdown (default ID: 10005) command here is also very important! !

Since our vad algorithm is controlled by starting a thread and using a loop function, and ROS itself is equivalent to starting a thread, if we do not perform unified processing, the process will be stuck. Therefore, we have set up a shutdown command, and the shutdown command can be used to end our other threads. The same is true for the voice patrol function that will be added later. As long as a thread class or ROS code in a loop is added, it can be processed according to the corresponding shutdown topic through this command.

for example:

def roscallback(data):
    if int(data.data)==10005:
            global start
            start=0
            print start
    rospy.loginfo(data.data)
//Here is to set a global variable and jump out of the loop function. Here, after our vad is over, the voice can't accept the data, so we jump out here.
loop, the program ends, or the thread is killed.
pub = rospy.Publisher('vad_listener_hear', String, queue_size=10)
rospy.init_node('vad_listener', anonymous=True)
sub = rospy.Subscriber("order_todo", String, roscallback)
rate = rospy.Rate(10) # 10hz
global start
while start==1:
  #The following is the task to be executed in a loop

By modifying the configuration of these parameters, we can call the functions provided by the ROS function package on the robot to realize other functions of voice interaction. The demos of Voice Control, Voice Navigation and Voice Patrol have been provided in the Handsfree file package, and these demos will be explained separately in subsequent tutorials.

The following is an error code information query of iFLYTEK. If the relevant error code is reported when calling iFLYTEK, you can query through this link. Inquiry about iFLYTEK error code information

results matching ""

    No results matching ""