Thursday, August 10, 2017

Cleaning up an image for OCR with ImageMagick

convert input.png -colorspace gray -type grayscale -contrast-stretch 0 -clone 0 -colorspace gray -negate -contrast-stretch 0 -compose copy_opacity -composite -fill "white" -opaque none +matte -deskew 40% -sharpen 0x1 output.png

Monday, December 22, 2014

Installing MariaDB with yum

Installing MariaDB & Tokudb with yum

----------------------------------------------------------------
nano /etc/yum.repos.d/MariaDB.repo

insert follow text:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.0/centos6-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

yum install MariaDB-server MariaDB-client
chkconfig --add mysql

service mysql start


edit file /etc/my.cnf.d/tokudb.cnf
replace uncomment line

plugin-load-add=ha_tokudb.so



You can check the status of Transparent Hugepages as follows:
cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
If the path does not exist, they are not enabled and you may continue.
To disable them, pass "transparent_hugepage=never" to the kernel in your bootloader (grub, lilo, etc.). For example, for SUSE, add transparent_hugepage=never to Optional Kernel Command Line Parameter at the end, such as after "showopts", and press OK. The setting will take effect on the next reboot.
You can also disable with:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag






Thursday, September 18, 2014

How to start APACHE TIKA file daemon on start or add service

Create file /etc/init.d/tika

nano /etc/init.d/tika

insert follow content


#!/bin/sh
# Starts, stops, and restarts Tika Server.
#
# chkconfig: 35 92 08 description: Starts and stops Tika server
# Written by David Braslavsky

TIKA_DIR="/usr/local/src/tika-1.5/tika-server/target/" JAVA_OPTIONS="-Xmx2048m  -jar tika-server-1.5.jar --hostname=hostname --port=9998"
LOG_FILE="/var/log/tika.log"
JAVA="/usr/bin/java"

case $1 in
    start)
        echo "Starting Tika"
        cd $TIKA_DIR
        $JAVA $JAVA_OPTIONS 2> $LOG_FILE &
        ;;
    stop)
        echo "Stopping Tika server"
        pid=`ps aux | grep tika-server | awk '{print $2}'`
        kill -9 $pid
        ;;
    restart)
        $0 stop
        sleep 1
        $0 start
        ;;
    *)
        echo "Usage: $0 {start|stop|restart}" >&2
        exit 1
        ;;
esac


chmod +x /etc/inid.d/tika

chkconfig --add tika

service tika start


Thursday, September 4, 2014

Install TIKA and MAVEN

Install MAVEN

Unix-based Operating Systems (Linux, Solaris and Mac OS X)

  1. Extract the distribution archive, i.e. apache-maven-3.2.3-bin.tar.gz to the directory you wish to install Maven 3.2.3. These instructions assume you chose /usr/local/apache-maven. The subdirectory apache-maven-3.2.3 will be created from the archive.
  2. In a command terminal, add the M2_HOME environment variable, e.g. export M2_HOME=/usr/local/apache-maven/apache-maven-3.2.3.
  3. Add the M2 environment variable, e.g. export M2=$M2_HOME/bin.
  4. Optional: Add the MAVEN_OPTS environment variable to specify JVM properties, e.g. export MAVEN_OPTS="-Xms256m -Xmx512m". This environment variable can be used to supply extra options to Maven.
  5. Add M2 environment variable to your path, e.g. export PATH=$M2:$PATH.
  6. Make sure that JAVA_HOME is set to the location of your JDK, e.g. export JAVA_HOME=/usr/java/jdk1.7.0_51 and that $JAVA_HOME/bin is in your PATH environment variable.
  7. Run mvn --version to verify that it is correctly installed.
source: http://maven.apache.org/download.cgi



Install TIKA


wget http://mirror.vorboss.net/apache/tika/tika-x.x-src.zip
unzip tika-x.x-src
cd ./tika-x.x/
mvn install
cd ./tika-server/target/
java -jar tika-server-x.x.jar





source: http://wiki.apache.org/tika/TikaJAXRS

Thursday, September 5, 2013

Send a file as an email attachment using Linux command line

yum install mutt

echo "This is the message body" | mutt -a "/path/to/file.to.attach" -s "subject of message" -- recipient@domain.com

Monday, October 15, 2012

Google Algorithm - What are the 200 Variables?

At PubCon, Matt Cutts mentioned that there were over 200 variables in the Google Algorithm.
I thought I’d start a list...
Domain
- Age of Domain
- History of domain
- KWs in domain name
- Sub domain or root domain?
- TLD of Domain
- IP address of domain
- Location of IP address / Server

Architecture
- HTML structure
- Use of Headers tags
- URL path
- Use of external CSS / JS files

Content
- Keyword density of page
- Keyword in Title Tag
- Keyword in Meta Description (Not Meta Keywords)
- Keyword in KW in header tags (H1, H2 etc)
- Keyword in body text
- Freshness of Content

Per Inbound Link
- Quality of website linking in
- Quality of web page linking in
- Age of website
- Age of web page
- Relevancy of page’s content
- Location of link (Footer, Navigation, Body text)
- Anchor text if link
- Title attribute of link
- Alt tag of images linking
- Country specific TLD domain
- Authority TLD (.edu, .gov)
- Location of server
- Authority Link (CNN, BBC, etc)

Cluster of Links
- Uniqueness of Class C address.

Internal Cross Linking
- No of internal links to page
- Location of link on page
- Anchor text of FIRST text link (Bruce Clay’s point at PubCon)

Penalties
- Over Optimisation
- Purchasing Links
- Selling Links
- Comment Spamming
- Cloaking
- Hidden Text
- Duplicate Content
- Keyword stuffing
- Manual penalties
- Sandbox effect (Probably the same as age of domain)

Miscellaneous
- JavaScript Links
- No Follow Links

Pending
- Performance / Load of a website
- Speed of JS

Misconceptions
- XML Sitemap (Aids the crawler but doesn’t help rankings)
- PageRank (General Indicator of page’s performance) 


Source