Skip to content
Discussion options

You must be logged in to vote

In the case of "Matthew", Jaccard coefficient is calculated like this:

a) Number of H5 units in part 01-07 that contain Matthew: 90
b) Number of H5 units in part 01-07 that do not contain Matthew: 338 - 90 = 248
c) Number of H5 units not in part 01-07 that contain Matthew: 285 - 90 = 195

Jaccard coefficient: a / (a + b + c) = 90 / (90 + 248 + 195) = 0.168856...

BTW, "Number of H5 units" means "number of cells in the excel file". H5 units means cells of Excel files.

It would be more accurate to say that the calculation is based on the probability of a word occuring within a given unit, rather than on word frequency. Because of this calculation, if you select a smaller unit such as "senten…

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
10 replies
@Andrealee1993
Comment options

@ko-ichi-h
Comment options

@ko-ichi-h
Comment options

Answer selected by Andrealee1993
@Andrealee1993
Comment options

@Andrealee1993
Comment options

@ko-ichi-h
Comment options

@Andrealee1993
Comment options

Comment options

You must be logged in to vote
1 reply
@ko-ichi-h
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants