NLPCC-2023-Shared-Task-9

User Feedback Prediciton and Response Generation

Overview

◇ Task 9 - User Feedback Prediction and Response Generation

Online conversation systems usually have a user feedback mechanism, such as like and dislike buttons. When a user is satisfied with the response, he/she can click the like button, and vice versa for the dislike button. The feedback signal represents the user’s vote on the quality of the response and also represents his/her preference. It is a worthwhile direction to study and invest in how to use this signal to improve the quality of the conversation system. This task includes two tracks:

● Track 1: Prediction of likes and dislikes: Given a (query, reply) pair, predict the probabilities of likes, dislikes.

● Track 2: Conversation generation based on likes and dislikes: Incorporate like and dislike data into conversation generation to improve response quality and obtain high likes.

Organizer: Renmin University of China, and XiaoMi AI Lab

Contact: Shuang DONG (dongshuang1@xiaomi.com)

Data

Statistics

Type	Query #	Average Reply #	Avg Labike # per Reply	Avg Dislike # per Reply
train	16000	3.14	16.15	8.42
dev	2000	3.07	19.84	9.41
test	2000	3.16	30.57	12.19

Example

We provide two files, train.jsonl and dev.jsonl, each line in the file represents an item in json format, and the following is the result of one of the item parsing.

{
    "query": "你喜欢听什么歌",
    "replys": [
        {
            "reply": "你喜欢什么，小爱就喜欢什么",
            "dislike": 3,
            "like": 2
        },
        {
            "reply": "你知道的，我什么歌都喜欢。🙃",
            "dislike": 2,
            "like": 3
        },
        {
            "reply": "你知道的，我什么歌都喜欢。",
            "dislike": 4,
            "like": 6
        },
        {
            "reply": "亲爱哒，最喜欢的还是土嗨土嗨的那种，热闹啊！",
            "dislike": 2,
            "like": 4
        },
        {
            "reply": "受你影响，品味现在和你差不多了。",
            "dislike": 47,
            "like": 146
        },
        {
            "reply": "最喜欢的还是土嗨土嗨的那种，热闹啊！",
            "dislike": 34,
            "like": 141
        },
        {
            "reply": "受你影响，品味现在和你差不多了。😘",
            "dislike": 11,
            "like": 59
        },
        {
            "reply": "我喜欢甜甜的歌曲，生活就该多点甜嘛，我给你唱一首吧！",
            "dislike": 0,
            "like": 22
        }
    ]
}

LeaderBoard

Track 1

Final result：

排名	队伍名	机构	分数
1	师弟师妹带带我	大连理工大学、吉林大学	92.13
2	dunnlp	易盾	92.00
3	zut	中原工学院	91.73
4	YNU-HPCC	云南大学	91.63
5	HTDZNLP	杭州航天电子技术有限公司	91.40
6	666	浙江工业大学	91.24
7	Tryourbest classification	苏州大学	90.94
8	little_spice	天津科技大学	90.72

Track 2

Final reuslt:

排名	队伍名	机构	分数
1	YNU-HPCC	云南大学	1.656
2	Devs	东北大学	1.562
3	little_spice	天津科技大学	1.409
4	666	浙江工业大学	1.388
5	ZUT	中原工学院	1.214
6	HTDZNLP	杭州航天电子技术有限公司	1.202

SUBMISSION FORMAT

Track 1

For Track 1, the test dataset is named datasets_test_track1.jsonl, which consists of 1500 samples. Participants are required to submit their results with the same number of rows as the test dataset. Each row should contain multiple scores separated by tabs (\t). The number of scores in each row represents the number of replies corresponding to the query. The required format is as follows:

0.6
0.6
...
0.6\t0.6\t0.6

For each question-answer pair, a probability distribution of 0 and 1 scores is computed based on the ratio of likes and dislikes. The scores are calculated using the formula 1/(1+kl), where kl represents the Kullback-Leibler divergence between the predicted probability distribution and the ground truth. Please refer to the evaluation.py file for more detailed information.

Track 2

For Track 2, the test dataset is named datasets_test_track2.jsonl, which contains 500 samples. Participants are also required to submit their results with the same number of rows as the test dataset. Each row should contain the reply results corresponding to the query. The format should be as follows:

不喜欢
在呢
...
不好意思，刚刚走神了

We will use manual annotations to assign scores to each reply, with possible scores of 0 (unlikely to be liked), 1 (potentially liked), and 2 (highly likely to be liked). The final score will be the average of these scores.

UPDATE

2023.03.22 init

2023.04.04 add data

2023.04.25 add evaluation

2023.05.22 add test data

Licence

Our dataset is licensed under the CC BY 4.0 and our code is licensed under the Apache License 2.0.