Skip to content

GitLoader

Load files from a Git repository into your Nappai workflow. This component lets you easily access documents stored in a Git repository, filtering them by name and content to get exactly what you need.

Relationship with Git

This component directly interacts with Git repositories. It allows you to specify a repository’s location (either a local path or a URL) and then download specific files based on your filtering criteria.

Inputs

  • Repository Path: The local path to your Git repository on your computer. This is required. For example: /Users/myuser/myrepo.
  • Clone URL: The URL of your Git repository (e.g., https://github.com/username/repo.git). This is optional. If provided, Nappai will clone the repository. If not provided, you must provide a Repository Path.
  • Branch: The specific branch within the Git repository to load files from. Defaults to ‘main’. For example: develop or feature/new-feature.
  • File Filter: (Advanced) A way to select files based on their names. Use wildcards like *.csv to include all CSV files, or !*.txt to exclude all text files. You can combine multiple patterns using commas (e.g., *.csv, *.xlsx, !report.csv).
  • Content Filter: (Advanced) A more powerful way to filter files based on the text inside the files. This uses regular expressions (regex), which are advanced search patterns. Only use this if you are familiar with regex.

Outputs

  • Data: A list of the files that were successfully loaded from the Git repository and meet your filtering criteria. This data can then be used by other components in your Nappai workflow (e.g., to analyze the content, extract information, or send it to another system).

Usage Example

Let’s say you have a Git repository containing CSV files with sales data. You want to load only the files from the sales_data branch that end in .csv.

  1. In the Nappai dashboard, add the GitLoader component to your workflow.
  2. Enter the Repository Path (if the repository is already cloned locally) or the Clone URL for your repository.
  3. Set the Branch to sales_data.
  4. In the File Filter field, enter *.csv.
  5. Run the workflow.
  6. The Data output will contain a list of the CSV files from the sales_data branch. You can then connect this output to other components, such as the Google Sheet Writer to upload the data to a Google Sheet.

Templates

[List of templates where the component is used - This section will be populated based on actual template usage.]

  • Google Sheet Writer: Upload the data loaded from Git to a Google Sheet.
  • PGVector: Analyze the loaded data using vector embeddings for semantic search.
  • Summarizer: Summarize the content of the loaded files.
  • Many more: The Data output can be used as input for a wide variety of Nappai components depending on your needs.

Tips and Best Practices

  • Start with simple filters. Only use advanced filtering options (File Filter and Content Filter) if you need precise control over which files are loaded.
  • Ensure your Git repository is accessible. Check your network connection and repository permissions.
  • For large repositories, filtering is crucial to improve performance.

Security Considerations

  • Only provide access to Git repositories that you own or have explicit permission to access.
  • Be mindful of the data you are loading from the repository, especially if it contains sensitive information. Consider using appropriate data masking or encryption techniques if necessary.