GitHub
This notebooks shows how you can load issues and pull requests (PRs) for a given repository on GitHub. Also shows how you can load github files for agiven repository on GitHub. We will use the LangChain Python repository as an example.
Setup access tokenโ
To access the GitHub API, you need a personal access token - you can set
up yours here: https://github.com/settings/tokens?type=beta. You can
either set this token as the environment variable
GITHUB_PERSONAL_ACCESS_TOKEN
and it will be automatically pulled in,
or you can pass it in directly at initialization as the access_token
named parameter.
# If you haven't set your access token as an environment variable, pass it in here.
from getpass import getpass
ACCESS_TOKEN = getpass()
Load Issues and PRsโ
from langchain_community.document_loaders import GitHubIssuesLoader
loader = GitHubIssuesLoader(
repo="langchain-ai/langchain",
access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.
creator="UmerHA",
)
Letโs load all issues and PRs created by โUmerHAโ.
Hereโs a list of all filters you can use: - include_prs - milestone - state - assignee - creator - mentioned - labels - sort - direction - since
For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues.
docs = loader.load()
print(docs[0].page_content)
print(docs[0].metadata)
Only load issuesโ
By default, the GitHub API returns considers pull requests to also be
issues. To only get โpureโ issues (i.e., no pull requests), use
include_prs=False
loader = GitHubIssuesLoader(
repo="langchain-ai/langchain",
access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.
creator="UmerHA",
include_prs=False,
)
docs = loader.load()
print(docs[0].page_content)
print(docs[0].metadata)
Load Github File Contentโ
For below code, loads all markdown file in rpeo langchain-ai/langchain
from langchain.document_loaders import GithubFileLoader
loader = GithubFileLoader(
repo="langchain-ai/langchain", # the repo name
access_token=ACCESS_TOKEN,
github_api_url="https://api.github.com",
file_filter=lambda file_path: file_path.endswith(
".md"
), # load all markdowns files.
)
documents = loader.load()
example output of one of document:
documents.metadata:
{
"path": "README.md",
"sha": "82f1c4ea88ecf8d2dfsfx06a700e84be4",
"source": "https://github.com/langchain-ai/langchain/blob/master/README.md"
}
documents.content:
mock content