Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters in searching feature with Tika #11060

Open
jesmrec opened this issue Feb 25, 2025 · 1 comment
Open

Special characters in searching feature with Tika #11060

jesmrec opened this issue Feb 25, 2025 · 1 comment
Labels

Comments

@jesmrec
Copy link

jesmrec commented Feb 25, 2025

Describe the bug

File that contains special characters like any of these ñíôùë*+ç$%&@? . iOS client performs a server-side-search over the content using an special character. Server does return empty or incomplete result. Some results seems not to be correct.

Steps to reproduce

Create a txt file called sum.txt with the following content:

one+one=two
four+four=eight

using iOS client, searching by content:

typed string in client returned Passed? comment
one no result one exists twice
four sum.txt
+ no result + exists
= no result = exists
on sum.txt
one+one no result one+one fits the content
four+four no result four+four fits the content

there are more examples with the other characters mentioned above.

This is the curl you can use to reproduce:

curl -X REPORT https://xx.xx.xx.xx:9200/remote.php/dav/spaces 
-H 'Host: xx.xx.xx.xx:9200' 
-H 'Original-Request-ID: B88FCC21-5664-4706-96A3-C3C2B314770B' --compressed 
-H 'Connection: keep-alive' 
-H 'User-Agent: ownCloudApp/12.4.0 (App/296; iOS/18.2; iPhone)' 
-H 'Accept-Language: en' 
-H 'Authorization: Bearer ...' 
-d '<?xml version="1.0" encoding="UTF-8"?>
<oc:search-files xmlns:D="DAV:" xmlns:oc="http://owncloud.org/ns">
  <D:prop>
  <D:resourcetype/>
  <D:getlastmodified/>
  <D:getcontentlength/>
  <D:getcontenttype/>
  <D:getetag/>
  <oc:id/>
  <oc:size/>
  <oc:permissions/>
  <oc:favorite/>
  <oc:share-types/>
  <oc:owner-id/>
  <oc:owner-display-name/>
  </D:prop>
  <oc:search>
    <oc:pattern>(content:&quot;*one*&quot;)</oc:pattern>
    <oc:limit>100</oc:limit>
  </oc:search>
</oc:search-files>'

replace *one* in the oc:pattern property for different results.

Setup

Created a docker container with server side search by using the following docker-compose.yml file:

version: "3.7"

services:
  ocis:
    image: owncloud/ocis:7.0.1
    ports:
      - "9200:9200"
      - "9215:9215"
    environment:
      OCIS_INSECURE: "true"
      OCIS_URL: "https://IP:9200"
      IDM_CREATE_DEMO_USERS: "true"
      IDM_ADMIN_PASSWORD: "admin"
      PROXY_ENABLE_BASIC_AUTH: "true"
      OCIS_SERVICE_ACCOUNT_ID: "b0fbfad7-3dd6-49cb-b468-3f588f2f82be"
      OCIS_SERVICE_ACCOUNT_SECRET: "asaGE4DF"
      SEARCH_EXTRACTOR_TYPE: tika
      SEARCH_EXTRACTOR_TIKA_TIKA_URL: "http://tika:9998"
      FRONTEND_FULL_TEXT_SEARCH_ENABLED: "true"
    restart: "no"
    entrypoint: ["/bin/sh"]
    command: ["-c", "ocis init || true; ocis server"]
    networks:
      - ocis-net

  tika:
    image: apache/tika:2.9.0.0-full
    restart: "always"
    networks:
      - ocis-net

networks:
  ocis-net:


</p>
</details>

## Additional context
Add any other context about the problem here.
@jesmrec
Copy link
Author

jesmrec commented Feb 26, 2025

OTOH, using a date for a more accurate search, does not work either

<oc:search>
    <oc:pattern>((mtime:&gt;2024-12-31T23:00:00+00:00) AND (name:&quot;*Joe*&quot;))</oc:pattern>
    <oc:limit>100</oc:limit>
</oc:search>

returns an empty set of values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant