Apache Tika – Contribute
Contribute to Apache Tika
Apache Tika is an Open Source project built and maintained by a diverse range of contributors. We welcome contributions of all types to the project - code, documentation, testing, bug triage, user support, and more! Send an email to the
Tika development list
if you're looking for somewhere to help.
Source Code
To download the source code for the latest release of Apache Tika, please see the
Download page
The master copy of the Apache Tika source code is held in GIT. You can fetch (clone) the source from
GitHub
, which you are welcome to fork from and open pull requests to.
You can also clone (checkout) the code from
and you can browse it online through the
Git web interface
Reporting Issues
Tika uses the
ASF JIRA instance
, for issue tracking, under the
Tika Project
When reporting an issue, please try to include the details, steps and documents required to reproduce it. If there are multiple documents that trigger the issue, a small file we can use in unit testing would be great. A JUnit unit test showing the problem can be helpful, but isn't required.
If you're new to reporting problems, you might find the
How to Report Bugs Effectively
essay (amongst many others) useful for learning more about what makes an effective and helpful bug report.
New Parsers, Detectors and Mime Types
The
Parser Quick Start Guide
provides instructions on adding new mime types and new parsers to Tika.
If your new Parser or Detector depends on libraries which we cannot include in Tika for license reasons, you are encouraged to list it on the
3rd Party Parser Plugins
page on the Tika wiki.
Submitting Enhancements and Fixes
All enhancements and fixes should have a
JIRA Issue or Enhancement
opened for them. This should describe the problem and the proposed fix / new code. The JIRA can be used for discussions on the code, and provides a single identifier for the change.
Git - Git users can run
git diff --no-prefix
to generate a patch of changed and new files, including binaries, which can then be attached to an issue.
GitHub Pull Requests - If you are working from our
GitHub mirror
, it is possible to open a pull request for your change. Please include the JIRA Issue number in the pull request, so it can be linked by the ASF GitHub bot.
ReviewBoard - If you have a Work-In-Progress patch for which you would like feedback / review / assistance, you can use the
Apache ReviewBoard Instance
to post your code. Please reference the JIRA Issue number from the review request, and add a link to it to the JIRA Issue.
Unit tests, License Headers - Wherever possible, we like new functionality and fixes to include small-ish unit tests. Whenever you make changes, please re-run the unit test suite (
mvn install
is one way to trigger this), and ensure your changes don't break anything. If adding new files, please include the Apache License v2 license header at the top of the file.
Dependencies
Any new dependencies introduced must be under a suitable license. Broadly, they must be Open Source, and must not place restrictions on larger works they are incorporated within. A list of the allowed licenses is maintained by the
ASF Legal Affairs Committee
. If in doubt, check on the dev list.
All new and updated dependencies must be in Maven Central. (It is not possible for Apache releases to depend on additional repositories in their poms). If possible, the project producing the dependency should be asked to publish it to Central, such as through the
Sonatype OSS Maven Repo
. If that isn't possible, someone will need to upload it via the
Sonatype 3rd Party OSS Artifacts process
. This will need to be completed before any patches depending on the new library can be committed to Tika.
Code Formatting
Java code should be indented with 4 spaces, no tabs. Opening brackets should normally be on the same line as the statement. Java coding standards are normally followed, but if in doubt follow what the existing code does!
Imports should normally be explicit, wildcard (foo.*) imports should not normally be used. The imports should be ordered by javax, then java, then other.
From time to time, you may find that code you are working on doesn't follow these rules. If you find that, please don't submit a single patch with logic changes + formatting together, as those are very hard to review. Instead, please submit two patches, one to correct formatting problems, and a second for your logic changes / fixes.
Other Resources
The
Apache Community Development project (ComDev)
provide general advice on getting started with contributing to Apache projects
The Apache Nutch project provide a comprehensive guide on
becoming a Nutch Developer
, much of which applies equally for Apache Tika too
The book
Tika in Action
has a lot of great information on how Tika works, and how to extend it
Apache Tika
Introduction
Contribute
Mailing Lists
Tika Wiki
Tika Server Wiki
Issue Tracker
Security Model
Security
Tika Support
Documentation
Apache Tika 3.3.0
Getting Started
Supported Formats
Parser API
Parser 5min Quick Start Guide
Content and Language Detection
Configuring Tika
Usage Examples
API Documentation
Apache Tika 3.2.3
Apache Tika 3.2.2
Apache Tika 3.2.1
Apache Tika 3.2.0
Apache Tika 3.1.0
Apache Tika 3.0.0
Apache Tika 3.0.0-BETA2
Apache Tika 3.0.0-BETA
Apache Tika 2.9.4
Apache Tika 2.9.3
Apache Tika 2.9.2
Apache Tika 2.9.1
Apache Tika 2.9.0
Apache Tika 2.8.0
Apache Tika 2.7.0
Apache Tika 2.6.0
Apache Tika 2.5.0
Apache Tika 2.4.1
Apache Tika 2.4.0
Apache Tika 2.3.0
Apache Tika 2.2.1
Apache Tika 2.2.0
Apache Tika 2.1.0
Apache Tika 2.0.0
The Apache Software Foundation
About
License
Security
Sponsorship
Thanks
Books about Tika
Copyright © 2026
The Apache Software Foundation
Site powered by
Apache Maven
Apache Tika, Tika, Apache, the Apache feather logo, and the Apache
Tika project logo are trademarks of The Apache Software Foundation.