Skip to content

Duplicated negative samples for a user exist #72

@swyo

Description

@swyo

Hello, @hexiangnan

First of all, thank you for sharing your codes for many reviewers.

I reviewed your codes, and explored preprocessed data in Data folder.

I find some strange thing;duplicated negative samples exist for a user.

In the paper Section 4.1 Evaluation Protocols, there is a sentence as follows.

we followed the common strategy [6, 21] that randomly samples 100 items that are not interacted by the user, ranking the test item among the 100 items.

Although you mentioned about replacement for negative sampling, I think it is reasonable to extract negative sampling without replacement for each user.

This is because the ndcg of test dataset would be over-estimated.

As an example, this scenario can be happened.

If given negative samples which has duplicated items, recommended list also can have duplicated items.

# suppose that there is a top 10 recommended list for given one positive and 99 negative samples with replacement.
recs= [10, 11, 11, 11, 9, 29, 102, 204, 23, 2]
gt = [11]
ndcg(recs, gt)

Above ndcg returns 1 / log2(1 + 2).

This ndcg is not reasonable because 11 sampled 3 times. It means other items lose their chances to be recommended.

Summary

Generally, recommended list is distinct.
However, your test negative samples has duplicated items for a user.

Please checkout as follows. (Reproduce unreasonable behavior)

for uid, iid, label in test_loader:
  assert len(set(iid)) == len(iid)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions