Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBX-522: Introduced Docker image for Solr standalone #214

Draft
wants to merge 3 commits into
base: 3.3
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docker/solr/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM solr:8.6.3

LABEL org.opencontainers.image.source=https://github.com/ezsystems/ezplatform-solr-search-engine

USER root
COPY conf server/solr/configsets/_default/conf
RUN sed --in-place '/<updateRequestProcessorChain name="add-unknown-fields-to-the-schema".*/,/<\/updateRequestProcessorChain>/d' server/solr/configsets/_default/conf/solrconfig.xml
RUN sed --in-place 's/${solr.autoSoftCommit.maxTime:-1}/${solr.autoSoftCommit.maxTime:20}/' server/solr/configsets/_default/conf/solrconfig.xml
Comment on lines +10 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why and what are those?

Copy link
Contributor Author

@Steveb-p Steveb-p Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They come from:

# Adapt autoSoftCommit to have a recommended value, and remove add-unknown-fields-to-the-schema
sed -i.bak '/<updateRequestProcessorChain name="add-unknown-fields-to-the-schema".*/,/<\/updateRequestProcessorChain>/d' $DESTINATION_DIR/solrconfig.xml
sed -i.bak2 's/${solr.autoSoftCommit.maxTime:-1}/${solr.autoSoftCommit.maxTime:20}/' $DESTINATION_DIR/solrconfig.xml

They are the only commands (other than straight up copying configuration files to the service) that are required to mimic the behavior of host-based Solr (created when init_solr.sh is executed).

From what I understand first disables automatic field registration as schema, and the second makes Solr commit more often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they really needed in a test environment i.e. not a long-running instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they really needed in a test environment i.e. not a long-running instance?

As described by @adamwojs in a personal conversation with me, they are needed because sometimes we test against field being empty in search engine, and this apparently is required for it to work. I didn't dig deeper tbh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they really needed in a test environment i.e. not a long-running instance?

Yes, these config changes are required for test environment.


USER solr

CMD ["solr-precreate", "collection1"]
19 changes: 19 additions & 0 deletions docker/solr/conf/custom-fields-types.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<!--
This is additional custom example fields. You can come up with similar
fields on your own, if you require custom indexing / query rules.

Instead of using the type "text" you might even want to define a custom
type. In the custom type you can define your dedicated index and query
rules. You can copy any existing fields into the custom field as seen
below.

In this case we copy the full user name and index it as a text field.
-->
<field name="custom_field" type="text" indexed="true" stored="false" required="false" multiValued="true" />
<copyField source="user_first_name_value_s" dest="custom_field" />
<copyField source="user_last_name_value_s" dest="custom_field" />

<field name="custom_geolocation_field" type="location" indexed="true" stored="false" required="false" />
<field name="custom_geolocation_field_0_coordinate" type="double" indexed="true" stored="false"/>
<field name="custom_geolocation_field_1_coordinate" type="double" indexed="true" stored="false"/>
<copyField source="testtype_maplocation_value_location_gl" dest="custom_geolocation_field" />
13 changes: 13 additions & 0 deletions docker/solr/conf/language-fieldtypes.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
141 changes: 141 additions & 0 deletions docker/solr/conf/managed-schema
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE schema [
<!ENTITY langfields SYSTEM "language-fieldtypes.xml">
<!ENTITY customfields SYSTEM "custom-fields-types.xml">
]>
<!--
This is the Solr schema file. This file should be named "schema.xml" and should
be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by
default) or located where the classloader for the Solr webapp can find it.

It provides the default types and definitions for a functional Solr based
search in eZ Publish 5. You may extend it with your own definitions, but you
should not remove or drastically change the existing definitions.
-->

<schema name="eZ Publish 5 base schema" version="1.5">
<!--
language specific field types are included here, there should be at least
a field type with the name "text" be defined"
Included in the eZ platform distribution are configurations for various
languages, including additional files like stopwords or other features
under the directory "solr.languages"
-->
&langfields;

<!--
custom field types and fields are included from a separate file to ease upgrades
-->
&customfields;

<!--
Default types by Solr. Will be reused for dynamic fields.
-->
<fieldType name="string" class="solr.TextField" sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true" sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true"/>
<!--
Numeric field types that index values using KD-trees.
Point fields don't support FieldCache, so they must have docValues="true" if needed for sorting, faceting, functions, etc.
-->
<fieldType name="pint" class="solr.IntPointField" docValues="true"/>
<fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
<fieldType name="plong" class="solr.LongPointField" docValues="true"/>
<fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>

<fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true"/>
<fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
<fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true"/>
<fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>
<fieldType name="random" class="solr.RandomSortField" indexed="true"/>

<fieldType name="identifier" class="solr.StrField" sortMissingLast="true" />
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" multiValued="false"/>
<fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="int" class="solr.IntPointField" docValues="true"/>
<fieldType name="float" class="solr.FloatPointField" docValues="true"/>
<fieldType name="long" class="solr.LongPointField" docValues="true"/>
<fieldType name="double" class="solr.DoublePointField" docValues="true"/>
<fieldType name="date" class="solr.DatePointField" docValues="true"/>

<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

<!--
Required ID field.
-->
<field name="id" type="string" indexed="true" stored="true" required="true"/>

<!--
Always contains the date a document was added to the index. Might be
useful.
-->
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>

<!--
Points to the root document of a block of nested documents. Required for nested document support.
-->
<field name="_root_" type="string" indexed="true" stored="true" required="false"/>

<field name="document_type_id" type="string" indexed="true" stored="true" required="true"/>

<!--
Dynamic field definitions. If a field name is not found, dynamicFields
will be used if the name matches any of the patterns. RESTRICTION: the
glob-like pattern in the name attribute must have a "*" only at the start
or the end. EXAMPLE: name="*_i" will match any field ending in _i (like
myid_i, z_i) Longer patterns will be matched first. if equal size
patterns both match, the first appearing in the schema will be used.
-->
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_mi" type="int" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_id" type="identifier" indexed="true" stored="true"/>
<dynamicField name="*_mid" type="identifier" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_ms" type="string" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_mb" type="boolean" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
<dynamicField name="*_gl" type="location" indexed="true" stored="true"/>
<dynamicField name="*_gl_0_coordinate" type="double" indexed="true" stored="true"/>
<dynamicField name="*_gl_1_coordinate" type="double" indexed="true" stored="true"/>

<!--
This field is required to allow random sorting
-->
<dynamicField name="random*" type="random" indexed="true" stored="false"/>

<!--
This field is required since Solr 4
-->
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />

<uniqueKey>id</uniqueKey>
</schema>
14 changes: 14 additions & 0 deletions docker/solr/conf/solr.languages/ar/language-fieldtypes.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!-- Arabic -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- for any non-arabic -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_ar.txt" />
<!-- normalizes ﻯ to ﻱ, etc -->
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
</analyzer>
</fieldType>


125 changes: 125 additions & 0 deletions docker/solr/conf/solr.languages/ar/stopwords_ar.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# This file was created by Jacques Savoy and is distributed under the BSD license.
# See http://members.unine.ch/jacques.savoy/clef/index.html.
# Also see http://www.opensource.org/licenses/bsd-license.html
# Cleaned on October 11, 2009 (not normalized, so use before normalization)
# This means that when modifying this list, you might need to add some
# redundant entries, for example containing forms with both أ and ا
من
ومن
منها
منه
في
وفي
فيها
فيه
و
ف
ثم
او
أو
ب
بها
به
ا
أ
اى
اي
أي
أى
لا
ولا
الا
ألا
إلا
لكن
ما
وما
كما
فما
عن
مع
اذا
إذا
ان
أن
إن
انها
أنها
إنها
انه
أنه
إنه
بان
بأن
فان
فأن
وان
وأن
وإن
التى
التي
الذى
الذي
الذين
الى
الي
إلى
إلي
على
عليها
عليه
اما
أما
إما
ايضا
أيضا
كل
وكل
لم
ولم
لن
ولن
هى
هي
هو
وهى
وهي
وهو
فهى
فهي
فهو
انت
أنت
لك
لها
له
هذه
هذا
تلك
ذلك
هناك
كانت
كان
يكون
تكون
وكانت
وكان
غير
بعض
قد
نحو
بين
بينما
منذ
ضمن
حيث
الان
الآن
خلال
بعد
قبل
حتى
عند
عندما
لدى
جميع
12 changes: 12 additions & 0 deletions docker/solr/conf/solr.languages/cjk/language-fieldtypes.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!-- CJK bigram -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- normalize width before bigram, as e.g. half-width dakuten combine -->
<filter class="solr.CJKWidthFilterFactory"/>
<!-- for any non-CJK -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory"/>
</analyzer>
</fieldType>

13 changes: 13 additions & 0 deletions docker/solr/conf/solr.languages/de/language-fieldtypes.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_de.txt" format="snowball" />
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.GermanLightStemFilterFactory"/>
<!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
<!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
</analyzer>
</fieldType>


Loading