GitLoader
This component lets you pull files from a Git repository into your Nappai workflow.
You can point it at a local repository or give it a URL to clone, choose a branch, and filter which files to load.
⚠️ DEPRECATION WARNING
This component is deprecated and will be removed in a future version of Nappai.
Please migrate to the recommended alternative components.
How it Works
The GitLoader component uses the GitLoader
class from LangChain to read files from a Git repository.
When you run it, the component:
- Locates the repository – either from a local path you provide or by cloning the URL you give it.
- Checks the branch – it defaults to
main
but you can specify any branch name. - Filters files – you can give a list of patterns (e.g.,
*.py
) to include or exclude files, and you can also give a regular‑expression that looks inside file contents. - Ignores binary files – any file that contains null bytes is skipped so that the loader only processes text files.
- Returns the data – each file is turned into a
Data
object that can be used by other components in your workflow.
No external services are called; everything runs locally on the machine that hosts Nappai.
Inputs
- Branch: The branch to load files from. Defaults to
main
. - Clone URL: The URL to clone the Git repository from.
- Content Filter: A regex pattern to filter files based on their content.
- File Filter: A list of patterns to filter files. Example to include only
.py
files:*.py
. Example to exclude.py
files:!*.py
. Multiple patterns can be separated by commas. - Repository Path: The local path to the Git repository. (Required)
Outputs
- Data: A list of
Data
objects, each representing a file that was loaded from the repository.
Usage Example
- Drag the GitLoader component onto the canvas.
- Fill in the required fields:
- Repository Path – e.g.,
/home/user/myrepo
- Branch –
main
(or another branch) - File Filter –
*.py
to only load Python files
- Repository Path – e.g.,
- (Optional) Add a Content Filter – e.g.,
def
to only load files that contain a function definition. - Connect the Data output to the next component in your workflow, such as a text‑analysis or AI‑assistant component.
- Run the workflow. The component will load the matching files and pass them forward.
Related Components
- TextSplitter – Breaks large documents into smaller chunks for easier processing.
- VectorStore – Stores documents in a vector database for similarity search.
- PromptTemplate – Creates prompts that can include the loaded data.
Tips and Best Practices
- Keep the File Filter as specific as possible to reduce load time.
- Use the Content Filter sparingly; complex regexes can slow down the loader.
- If you only need a few files, consider cloning the repository manually and pointing the component to the local path.
- Remember that binary files are automatically ignored, so you don’t need to add extra filters for them.
Security Considerations
- The component reads files from the local filesystem or from a cloned repository, so it inherits the security of the underlying Git server.
- Avoid giving the component access to sensitive repositories unless you trust the environment it runs in.
- If you use a public clone URL, ensure it is from a reputable source to prevent malicious code from being loaded.