Skip to content

GitLoader

This component lets you pull files from a Git repository into your Nappai workflow.
You can point it at a local repository or give it a URL to clone, choose a branch, and filter which files to load.

⚠️ DEPRECATION WARNING

This component is deprecated and will be removed in a future version of Nappai.
Please migrate to the recommended alternative components.

How it Works

The GitLoader component uses the GitLoader class from LangChain to read files from a Git repository.
When you run it, the component:

  1. Locates the repository – either from a local path you provide or by cloning the URL you give it.
  2. Checks the branch – it defaults to main but you can specify any branch name.
  3. Filters files – you can give a list of patterns (e.g., *.py) to include or exclude files, and you can also give a regular‑expression that looks inside file contents.
  4. Ignores binary files – any file that contains null bytes is skipped so that the loader only processes text files.
  5. Returns the data – each file is turned into a Data object that can be used by other components in your workflow.

No external services are called; everything runs locally on the machine that hosts Nappai.

Inputs

  • Branch: The branch to load files from. Defaults to main.
  • Clone URL: The URL to clone the Git repository from.
  • Content Filter: A regex pattern to filter files based on their content.
  • File Filter: A list of patterns to filter files. Example to include only .py files: *.py. Example to exclude .py files: !*.py. Multiple patterns can be separated by commas.
  • Repository Path: The local path to the Git repository. (Required)

Outputs

  • Data: A list of Data objects, each representing a file that was loaded from the repository.

Usage Example

  1. Drag the GitLoader component onto the canvas.
  2. Fill in the required fields:
    • Repository Path – e.g., /home/user/myrepo
    • Branchmain (or another branch)
    • File Filter*.py to only load Python files
  3. (Optional) Add a Content Filter – e.g., def to only load files that contain a function definition.
  4. Connect the Data output to the next component in your workflow, such as a text‑analysis or AI‑assistant component.
  5. Run the workflow. The component will load the matching files and pass them forward.
  • TextSplitter – Breaks large documents into smaller chunks for easier processing.
  • VectorStore – Stores documents in a vector database for similarity search.
  • PromptTemplate – Creates prompts that can include the loaded data.

Tips and Best Practices

  • Keep the File Filter as specific as possible to reduce load time.
  • Use the Content Filter sparingly; complex regexes can slow down the loader.
  • If you only need a few files, consider cloning the repository manually and pointing the component to the local path.
  • Remember that binary files are automatically ignored, so you don’t need to add extra filters for them.

Security Considerations

  • The component reads files from the local filesystem or from a cloned repository, so it inherits the security of the underlying Git server.
  • Avoid giving the component access to sensitive repositories unless you trust the environment it runs in.
  • If you use a public clone URL, ensure it is from a reputable source to prevent malicious code from being loaded.