◇ Task 9 - User Feedback Prediction and Response Generation
Online conversation systems usually have a user feedback mechanism, such as like and dislike buttons. When a user is satisfied with the response, he/she can click the like button, and vice versa for the dislike button. The feedback signal represents the user’s vote on the quality of the response and also represents his/her preference. It is a worthwhile direction to study and invest in how to use this signal to improve the quality of the conversation system. This task includes two tracks:
● Track 1: Prediction of likes and dislikes: Given a (query, reply) pair, predict the probabilities of likes, dislikes.
● Track 2: Conversation generation based on likes and dislikes: Incorporate like and dislike data into conversation generation to improve response quality and obtain high likes.
Organizer: Renmin University of China, and XiaoMi AI Lab
We provide two files, train.jsonl and dev.jsonl, each line in the file represents an item in json format, and the following is the result of one of the item parsing.
For Track 1, the test dataset is named datasets_test_track1.jsonl, which consists of 1500 samples. Participants are required to submit their results with the same number of rows as the test dataset. Each row should contain multiple scores separated by tabs (\t). The number of scores in each row represents the number of replies corresponding to the query. The required format is as follows:
0.6
0.6
...
0.6\t0.6\t0.6
For each question-answer pair, a probability distribution of 0 and 1 scores is computed based on the ratio of likes and dislikes. The scores are calculated using the formula 1/(1+kl), where kl represents the Kullback-Leibler divergence between the predicted probability distribution and the ground truth. Please refer to the evaluation.py file for more detailed information.
Track 2
For Track 2, the test dataset is named datasets_test_track2.jsonl, which contains 500 samples. Participants are also required to submit their results with the same number of rows as the test dataset. Each row should contain the reply results corresponding to the query. The format should be as follows:
不喜欢
在呢
...
不好意思,刚刚走神了
We will use manual annotations to assign scores to each reply, with possible scores of 0 (unlikely to be liked), 1 (potentially liked), and 2 (highly likely to be liked). The final score will be the average of these scores.
UPDATE
2023.03.22 init
2023.04.04 add data
2023.04.25 add evaluation
2023.05.22 add test data
Licence
Our dataset is licensed under the CC BY 4.0 and our code is licensed under the Apache License 2.0.
NLPCC-2023-Shared-Task-9
User Feedback Prediciton and Response Generation
Overview
◇ Task 9 - User Feedback Prediction and Response Generation
Online conversation systems usually have a user feedback mechanism, such as like and dislike buttons. When a user is satisfied with the response, he/she can click the like button, and vice versa for the dislike button. The feedback signal represents the user’s vote on the quality of the response and also represents his/her preference. It is a worthwhile direction to study and invest in how to use this signal to improve the quality of the conversation system. This task includes two tracks:
● Track 1: Prediction of likes and dislikes: Given a (query, reply) pair, predict the probabilities of likes, dislikes.
● Track 2: Conversation generation based on likes and dislikes: Incorporate like and dislike data into conversation generation to improve response quality and obtain high likes.
Organizer: Renmin University of China, and XiaoMi AI Lab
Contact: Shuang DONG (dongshuang1@xiaomi.com)
Data
Statistics
Example
We provide two files, train.jsonl and dev.jsonl, each line in the file represents an item in json format, and the following is the result of one of the item parsing.
LeaderBoard
Track 1
Final result:
Track 2
Final reuslt:
SUBMISSION FORMAT
Track 1
For Track 1, the test dataset is named
datasets_test_track1.jsonl, which consists of 1500 samples. Participants are required to submit their results with the same number of rows as the test dataset. Each row should contain multiple scores separated by tabs (\t). The number of scores in each row represents the number of replies corresponding to the query. The required format is as follows:For each question-answer pair, a probability distribution of 0 and 1 scores is computed based on the ratio of likes and dislikes. The scores are calculated using the formula 1/(1+kl), where kl represents the Kullback-Leibler divergence between the predicted probability distribution and the ground truth. Please refer to the
evaluation.pyfile for more detailed information.Track 2
For Track 2, the test dataset is named
datasets_test_track2.jsonl, which contains 500 samples. Participants are also required to submit their results with the same number of rows as the test dataset. Each row should contain the reply results corresponding to the query. The format should be as follows:We will use manual annotations to assign scores to each reply, with possible scores of 0 (unlikely to be liked), 1 (potentially liked), and 2 (highly likely to be liked). The final score will be the average of these scores.
UPDATE
2023.03.22 init
2023.04.04 add data
2023.04.25 add evaluation
2023.05.22 add test data
Licence