GitLoader
Load files from a Git repository into your Nappai workflow. This component lets you easily access documents stored in a Git repository, filtering them by name and content to get exactly what you need.
Relationship with Git
This component directly interacts with Git repositories. It allows you to specify a repository’s location (either a local path or a URL) and then download specific files based on your filtering criteria.
Inputs
- Repository Path: The local path to your Git repository on your computer. This is required. For example:
/Users/myuser/myrepo
. - Clone URL: The URL of your Git repository (e.g.,
https://github.com/username/repo.git
). This is optional. If provided, Nappai will clone the repository. If not provided, you must provide aRepository Path
. - Branch: The specific branch within the Git repository to load files from. Defaults to ‘main’. For example:
develop
orfeature/new-feature
. - File Filter: (Advanced) A way to select files based on their names. Use wildcards like
*.csv
to include all CSV files, or!*.txt
to exclude all text files. You can combine multiple patterns using commas (e.g.,*.csv, *.xlsx, !report.csv
). - Content Filter: (Advanced) A more powerful way to filter files based on the text inside the files. This uses regular expressions (regex), which are advanced search patterns. Only use this if you are familiar with regex.
Outputs
- Data: A list of the files that were successfully loaded from the Git repository and meet your filtering criteria. This data can then be used by other components in your Nappai workflow (e.g., to analyze the content, extract information, or send it to another system).
Usage Example
Let’s say you have a Git repository containing CSV files with sales data. You want to load only the files from the sales_data
branch that end in .csv
.
- In the Nappai dashboard, add the
GitLoader
component to your workflow. - Enter the
Repository Path
(if the repository is already cloned locally) or theClone URL
for your repository. - Set the
Branch
tosales_data
. - In the
File Filter
field, enter*.csv
. - Run the workflow.
- The
Data
output will contain a list of the CSV files from thesales_data
branch. You can then connect this output to other components, such as theGoogle Sheet Writer
to upload the data to a Google Sheet.
Templates
[List of templates where the component is used - This section will be populated based on actual template usage.]
Related Components
- Google Sheet Writer: Upload the data loaded from Git to a Google Sheet.
- PGVector: Analyze the loaded data using vector embeddings for semantic search.
- Summarizer: Summarize the content of the loaded files.
- Many more: The
Data
output can be used as input for a wide variety of Nappai components depending on your needs.
Tips and Best Practices
- Start with simple filters. Only use advanced filtering options (File Filter and Content Filter) if you need precise control over which files are loaded.
- Ensure your Git repository is accessible. Check your network connection and repository permissions.
- For large repositories, filtering is crucial to improve performance.
Security Considerations
- Only provide access to Git repositories that you own or have explicit permission to access.
- Be mindful of the data you are loading from the repository, especially if it contains sensitive information. Consider using appropriate data masking or encryption techniques if necessary.