David's Notes
Thursday, August 10, 2017
Cleaning up an image for OCR with ImageMagick
convert input.png -colorspace gray -type grayscale -contrast-stretch 0 -clone 0 -colorspace gray -negate -contrast-stretch 0 -compose copy_opacity -composite -fill "white" -opaque none +matte -deskew 40% -sharpen 0x1 output.png
Monday, December 22, 2014
Installing MariaDB with yum
Installing MariaDB & Tokudb with yum
----------------------------------------------------------------
nano /etc/yum.repos.d/MariaDB.repo
insert follow text:
[mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.0/centos6-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1
yum install MariaDB-server MariaDB-client
chkconfig --add mysql
service mysql start
edit file /etc/my.cnf.d/tokudb.cnf
replace uncomment line
plugin-load-add=ha_tokudb.so
You can check the status of Transparent Hugepages as follows:
cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never
If the path does not exist, they are not enabled and you may continue.
To disable them, pass "transparent_hugepage=never" to the kernel in your bootloader (grub, lilo, etc.). For example, for SUSE, add
transparent_hugepage=never
to Optional Kernel Command Line Parameter at the end, such as after "showopts", and press OK. The setting will take effect on the next reboot.
You can also disable with:
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag
For more information, see http://unix.stackexchange.com/questions/99154/disable-transparent-hugepages
Thursday, September 18, 2014
How to start APACHE TIKA file daemon on start or add service
Create file /etc/init.d/tika
nano /etc/init.d/tika
insert follow content
nano /etc/init.d/tika
insert follow content
#!/bin/sh # Starts, stops, and restarts Tika Server. # # chkconfig: 35 92 08 description: Starts and stops Tika server # Written by David Braslavsky TIKA_DIR="/usr/local/src/tika-1.5/tika-server/target/" JAVA_OPTIONS="-Xmx2048m -jar tika-server-1.5.jar --hostname=hostname --port=9998" LOG_FILE="/var/log/tika.log" JAVA="/usr/bin/java" case $1 in start) echo "Starting Tika" cd $TIKA_DIR $JAVA $JAVA_OPTIONS 2> $LOG_FILE & ;; stop) echo "Stopping Tika server" pid=`ps aux | grep tika-server | awk '{print $2}'` kill -9 $pid ;; restart) $0 stop sleep 1 $0 start ;; *) echo "Usage: $0 {start|stop|restart}" >&2 exit 1 ;; esac
chmod +x /etc/inid.d/tika
chkconfig --add tika
service tika start
Thursday, September 4, 2014
Install TIKA and MAVEN
Install MAVEN
Unix-based Operating Systems (Linux, Solaris and Mac OS X)
- Extract the distribution archive, i.e. apache-maven-3.2.3-bin.tar.gz to the directory you wish to install Maven 3.2.3. These instructions assume you chose /usr/local/apache-maven. The subdirectory apache-maven-3.2.3 will be created from the archive.
- In a command terminal, add the M2_HOME environment variable, e.g. export M2_HOME=/usr/local/apache-maven/apache-maven-3.2.3.
- Add the M2 environment variable, e.g. export M2=$M2_HOME/bin.
- Optional: Add the MAVEN_OPTS environment variable to specify JVM properties, e.g. export MAVEN_OPTS="-Xms256m -Xmx512m". This environment variable can be used to supply extra options to Maven.
- Add M2 environment variable to your path, e.g. export PATH=$M2:$PATH.
- Make sure that JAVA_HOME is set to the location of your JDK, e.g. export JAVA_HOME=/usr/java/jdk1.7.0_51 and that $JAVA_HOME/bin is in your PATH environment variable.
- Run mvn --version to verify that it is correctly installed.
source: http://maven.apache.org/download.cgi
Install TIKA
wget http://mirror.vorboss.net/apache/tika/tika-x.x-src.zip unzip tika-x.x-src cd ./tika-x.x/ mvn install cd ./tika-server/target/ java -jar tika-server-x.x.jar
source: http://wiki.apache.org/tika/TikaJAXRS
Thursday, September 5, 2013
Send a file as an email attachment using Linux command line
yum install mutt
echo "This is the message body" | mutt -a "/path/to/file.to.attach" -s "subject of message" -- recipient@domain.com
Friday, March 15, 2013
Monday, October 15, 2012
Google Algorithm - What are the 200 Variables?
At PubCon, Matt Cutts mentioned that there were over 200 variables in the Google Algorithm.
I thought I’d start a list...
Domain
- Age of Domain
- History of domain
- KWs in domain name
- Sub domain or root domain?
- TLD of Domain
- IP address of domain
- Location of IP address / Server
Architecture
- HTML structure
- Use of Headers tags
- URL path
- Use of external CSS / JS files
Content
- Keyword density of page
- Keyword in Title Tag
- Keyword in Meta Description (Not Meta Keywords)
- Keyword in KW in header tags (H1, H2 etc)
- Keyword in body text
- Freshness of Content
Per Inbound Link
- Quality of website linking in
- Quality of web page linking in
- Age of website
- Age of web page
- Relevancy of page’s content
- Location of link (Footer, Navigation, Body text)
- Anchor text if link
- Title attribute of link
- Alt tag of images linking
- Country specific TLD domain
- Authority TLD (.edu, .gov)
- Location of server
- Authority Link (CNN, BBC, etc)
Cluster of Links
- Uniqueness of Class C address.
Internal Cross Linking
- No of internal links to page
- Location of link on page
- Anchor text of FIRST text link (Bruce Clay’s point at PubCon)
Penalties
- Over Optimisation
- Purchasing Links
- Selling Links
- Comment Spamming
- Cloaking
- Hidden Text
- Duplicate Content
- Keyword stuffing
- Manual penalties
- Sandbox effect (Probably the same as age of domain)
Miscellaneous
- JavaScript Links
- No Follow Links
Pending
- Performance / Load of a website
- Speed of JS
Misconceptions
- XML Sitemap (Aids the crawler but doesn’t help rankings)
- PageRank (General Indicator of page’s performance)
Source
I thought I’d start a list...
Domain
- Age of Domain
- History of domain
- KWs in domain name
- Sub domain or root domain?
- TLD of Domain
- IP address of domain
- Location of IP address / Server
Architecture
- HTML structure
- Use of Headers tags
- URL path
- Use of external CSS / JS files
Content
- Keyword density of page
- Keyword in Title Tag
- Keyword in Meta Description (Not Meta Keywords)
- Keyword in KW in header tags (H1, H2 etc)
- Keyword in body text
- Freshness of Content
Per Inbound Link
- Quality of website linking in
- Quality of web page linking in
- Age of website
- Age of web page
- Relevancy of page’s content
- Location of link (Footer, Navigation, Body text)
- Anchor text if link
- Title attribute of link
- Alt tag of images linking
- Country specific TLD domain
- Authority TLD (.edu, .gov)
- Location of server
- Authority Link (CNN, BBC, etc)
Cluster of Links
- Uniqueness of Class C address.
Internal Cross Linking
- No of internal links to page
- Location of link on page
- Anchor text of FIRST text link (Bruce Clay’s point at PubCon)
Penalties
- Over Optimisation
- Purchasing Links
- Selling Links
- Comment Spamming
- Cloaking
- Hidden Text
- Duplicate Content
- Keyword stuffing
- Manual penalties
- Sandbox effect (Probably the same as age of domain)
Miscellaneous
- JavaScript Links
- No Follow Links
Pending
- Performance / Load of a website
- Speed of JS
Misconceptions
- XML Sitemap (Aids the crawler but doesn’t help rankings)
- PageRank (General Indicator of page’s performance)
Source
Subscribe to:
Posts (Atom)