-
-
Notifications
You must be signed in to change notification settings - Fork 151
Description
Is your feature request related to a problem? Please describe.
I am currently importing large amounts of commits at once for analysis. Grabbing every detail of every commit at once occasionally means that imports take a LONG time - processing every property of every commit in the Linux Kernel, for example, takes about three weeks if you run pydriller consistently.
Describe the solution you'd like
I would love to be able to run pydriller in the context of a ThreadPoolExecutor block so as to parallelize (and ideally speed up) this process. Currently, when I run pydriller in a ThreadPoolExecutor block, segments of whatever commit I'm running on get chopped up and get passed in as the wrong values. For example, I've seen a python function passed in as commit.author.name
, when it should be in commit.methods
.
Additional context
Not a bug - I bet this was just not thought about in the original scope of the project. Thank you for all that you do!