-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
My markdown doc is structured as:
# header1
## header2
Some text
## header2
Some more text
### Step 0: this is pre-planning step
* ⚠️ this is a warning
▶ the top line
▶ the next line
▶ the next line
▶ the next line
1. numbered list
1. numbered list
### Step 1: the first actual step
▶ the top line
▶ the next line
▶ the next line
▶ the next line
1. numbered list
1. numbered list
### Step 2: the second step
etc...
My code:
import semchunk
chunker = semchunk.chunkerify('gpt-4', chunk_size = 2000)
chunker(text)
I would expect the chunker to split by headers, when possible; however, the chunks generally END with a header.
An example chunk:
▶ the top line
▶ the next line
▶ the next line
▶ the next line
1. numbered list
1. numbered list
### Step 2: the second step
...instead of:
### Step 1: the first actual step
▶ the top line
▶ the next line
▶ the next line
▶ the next line
1. numbered list
1. numbered list
Any idea why this is happening?
Metadata
Metadata
Assignees
Labels
No labels