Duplicated negative samples for a user exist

Hello, @hexiangnan 

First of all, thank you for sharing your codes for many reviewers.

I reviewed your codes, and explored preprocessed data in `Data` folder.

I find some strange thing;**duplicated negative samples** exist for a user.

In the [paper](https://arxiv.org/pdf/1708.05031.pdf) Section 4.1 Evaluation Protocols, there is a sentence as follows.
> we followed the common strategy [6, 21] that randomly samples 100 items that are not interacted by the user, ranking the test item among the 100 items.

Although you mentioned about replacement for negative sampling, I think it is reasonable to extract negative sampling without replacement for each user.

**This is because** the ndcg of test dataset would be over-estimated.

As an example, this scenario can be happened.

If given negative samples which has duplicated items, recommended list also can have duplicated items.
```
# suppose that there is a top 10 recommended list for given one positive and 99 negative samples with replacement.
recs= [10, 11, 11, 11, 9, 29, 102, 204, 23, 2]
gt = [11]
ndcg(recs, gt)
```
Above ndcg returns `1 / log2(1 + 2)`.

This ndcg is not reasonable  because 11 sampled 3 times. It means other items lose their chances to be recommended.

## Summary

Generally, recommended list is distinct. 
However, your test negative samples has duplicated items for a user.

Please checkout as follows. (Reproduce unreasonable behavior)
```python
for uid, iid, label in test_loader:
  assert len(set(iid)) == len(iid)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Duplicated negative samples for a user exist #72

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Duplicated negative samples for a user exist #72

Description

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions